Ollama Intern MCP — Handbook

Ollama Intern MCP gives Claude Code a local intern with rules, tiers, a desk, and a filing cabinet. Claude picks the tool; the tool picks the tier (Instant / Workhorse / Deep / Embed); the tier writes a file you can open next week.

Local-first — zero network egress until you opt in. No telemetry. No “autonomous” anything. Every call shows its work. Optional Ollama Cloud routing puts 600B-class models behind the same tools when local hardware is the bottleneck, with automatic fallback to local.

The shape

Four tiers, 42 tools total.

Tier	Count	Purpose
Atoms	28	Job-shaped primitives (`classify`, `extract`, `triage_logs`, `summarize_`, `draft`, `research`, `corpus_`, `embed*`, `chat`, plus the 13 v2.1.0 ops/refactor/corpus/artifact additions — see Tool reference). Batch-capable atoms accept `items: [{id, text}]`.
Briefs	3	Evidence-backed structured operator briefs — `incident_brief`, `repo_brief`, `change_brief`.
Packs	3	Fixed-pipeline compound jobs that write durable markdown + JSON. `incident_pack`, `repo_pack`, `change_pack`.
Artifacts	7	Continuity surface — `list`, `read`, `diff`, `export_to_path`, plus three deterministic snippet helpers.

Freeze lines: atoms at 28 (freeze lifted at v2.1.0; new atoms require audit-justified gap + tests + handbook page + CHANGELOG entry); packs and artifact tiers remain frozen at 3 and 7.

Why this project exists

Every local-LLM MCP server leads with token-savings. Ours leads with what the intern produces:

a durable markdown file you can open tomorrow
an evidence block where every cited id was verified server-side
a weak: true flag when the evidence doesn’t support the claim — never a smoothed narrative
investigative next_checks, never “apply this fix”

Where to go next

Quickstart — your first 5 minutes, end-to-end, from install to artifact
Getting started — install, Claude Code config, model pulls
Tool reference — every tool grouped by tier (overview + per-tool deep-dives for the most-used tools)
Envelope & tiers — uniform envelope, hardware profiles, residency
Artifacts & continuity — how packs write to disk and how to use what they wrote
Laws & guardrails — evidence-first, no remediation drift, deterministic renderers
Security & threat model — what’s touched, what’s not, what’s in the log
Ollama Cloud (optional) — opt-in cloud-primary routing with local fallback; off by default, zero egress until you set a key
Corpora — build, refresh, search, answer over a living corpus; manifest v2 + :latest drift
Error codes — every structured error code, when you’ll see it, what to do
Use with Hermes — drive this MCP from Nous Research’s Hermes Agent on hermes3:8b (validated 2026-04-19)
Troubleshooting — Ollama not running, model pull failures, hardware insufficient, MCP server not appearing in Claude Code
Observability — read the NDJSON log, field semantics, jq recipes, degradation signatures, ollama_log_tail
Comparison — honest matrix vs other local-LLM MCPs, raw Ollama, and Claude-direct

Architecture at a glance

flowchart LR
  Claude["Claude Code<br/>(MCP client)"]
  MCP["ollama-intern-mcp<br/>server (stdio)"]
  Ollama["Ollama daemon<br/>(127.0.0.1:11434)"]
  Models[("Hermes 3 / Qwen 3<br/>nomic-embed-text")]
  Corpus[("~/.ollama-intern/<br/>corpora/")]
  Artifacts[("~/.ollama-intern/<br/>artifacts/")]
  NDJSON[("~/.ollama-intern/<br/>log.ndjson")]
  Guards{{"Guardrails<br/>citations · banned phrases<br/>protected paths · confidence"}}

  Claude -- "JSON-RPC over stdio" --> MCP
  MCP --> Guards
  MCP -- "/api/generate · /api/chat<br/>/api/embed · /api/ps · /api/tags" --> Ollama
  Ollama --> Models
  MCP --- Corpus
  MCP --- Artifacts
  MCP --> NDJSON

Every Claude tool call enters the MCP server over stdio JSON-RPC. The server validates the call against the tool’s zod schema, runs the configured guardrails (citation validation, banned-phrase strip, protected-path enforcement, confidence thresholds), then routes to either a deterministic renderer (artifact tier) or an Ollama HTTP call (every other tier). The Ollama daemon never sees user-supplied paths — only the model tier and the prepared prompt. Every call appends one structured event to the NDJSON log.