Ollama Intern MCP — Handbook
Ollama Intern MCP gives Claude Code a local intern with rules, tiers, a desk, and a filing cabinet. Claude picks the tool; the tool picks the tier (Instant / Workhorse / Deep / Embed); the tier writes a file you can open next week.
Local-first — zero network egress until you opt in. No telemetry. No “autonomous” anything. Every call shows its work. Optional Ollama Cloud routing puts 600B-class models behind the same tools when local hardware is the bottleneck, with automatic fallback to local.
The shape
Section titled “The shape”Four tiers, 42 tools total.
| Tier | Count | Purpose |
|---|---|---|
| Atoms | 28 | Job-shaped primitives (classify, extract, triage_logs, summarize_*, draft, research, corpus_*, embed*, chat, plus the 13 v2.1.0 ops/refactor/corpus/artifact additions — see Tool reference). Batch-capable atoms accept items: [{id, text}]. |
| Briefs | 3 | Evidence-backed structured operator briefs — incident_brief, repo_brief, change_brief. |
| Packs | 3 | Fixed-pipeline compound jobs that write durable markdown + JSON. incident_pack, repo_pack, change_pack. |
| Artifacts | 7 | Continuity surface — list, read, diff, export_to_path, plus three deterministic snippet helpers. |
Freeze lines: atoms at 28 (freeze lifted at v2.1.0; new atoms require audit-justified gap + tests + handbook page + CHANGELOG entry); packs and artifact tiers remain frozen at 3 and 7.
Why this project exists
Section titled “Why this project exists”Every local-LLM MCP server leads with token-savings. Ours leads with what the intern produces:
- a durable markdown file you can open tomorrow
- an evidence block where every cited id was verified server-side
- a
weak: trueflag when the evidence doesn’t support the claim — never a smoothed narrative - investigative
next_checks, never “apply this fix”
Where to go next
Section titled “Where to go next”- Quickstart — your first 5 minutes, end-to-end, from install to artifact
- Getting started — install, Claude Code config, model pulls
- Tool reference — every tool grouped by tier (overview + per-tool deep-dives for the most-used tools)
- Envelope & tiers — uniform envelope, hardware profiles, residency
- Artifacts & continuity — how packs write to disk and how to use what they wrote
- Laws & guardrails — evidence-first, no remediation drift, deterministic renderers
- Security & threat model — what’s touched, what’s not, what’s in the log
- Ollama Cloud (optional) — opt-in cloud-primary routing with local fallback; off by default, zero egress until you set a key
- Corpora — build, refresh, search, answer over a living corpus; manifest v2 +
:latestdrift - Error codes — every structured error code, when you’ll see it, what to do
- Use with Hermes — drive this MCP from Nous Research’s Hermes Agent on hermes3:8b (validated 2026-04-19)
- Troubleshooting — Ollama not running, model pull failures, hardware insufficient, MCP server not appearing in Claude Code
- Observability — read the NDJSON log, field semantics, jq recipes, degradation signatures,
ollama_log_tail - Comparison — honest matrix vs other local-LLM MCPs, raw Ollama, and Claude-direct
Architecture at a glance
Section titled “Architecture at a glance”flowchart LR Claude["Claude Code<br/>(MCP client)"] MCP["ollama-intern-mcp<br/>server (stdio)"] Ollama["Ollama daemon<br/>(127.0.0.1:11434)"] Models[("Hermes 3 / Qwen 3<br/>nomic-embed-text")] Corpus[("~/.ollama-intern/<br/>corpora/")] Artifacts[("~/.ollama-intern/<br/>artifacts/")] NDJSON[("~/.ollama-intern/<br/>log.ndjson")] Guards{{"Guardrails<br/>citations · banned phrases<br/>protected paths · confidence"}}
Claude -- "JSON-RPC over stdio" --> MCP MCP --> Guards MCP -- "/api/generate · /api/chat<br/>/api/embed · /api/ps · /api/tags" --> Ollama Ollama --> Models MCP --- Corpus MCP --- Artifacts MCP --> NDJSONEvery Claude tool call enters the MCP server over stdio JSON-RPC. The server validates the call against the tool’s zod schema, runs the configured guardrails (citation validation, banned-phrase strip, protected-path enforcement, confidence thresholds), then routes to either a deterministic renderer (artifact tier) or an Ollama HTTP call (every other tier). The Ollama daemon never sees user-supplied paths — only the model tier and the prepared prompt. Every call appends one structured event to the NDJSON log.