The local intern for Claude Code.
42 job-shaped tools across four tiers — atoms, briefs, packs, artifacts. Claude picks the tool, the tool picks the tier, the tier writes a file you can open next week. Local-first and zero-egress by default, with optional Ollama Cloud routing for 600B-class models when local hardware is the bottleneck. No telemetry. Every call shows its work.
Call
// Claude → ollama-intern-mcp
{
"tool": "ollama_incident_pack",
"arguments": {
"title": "sprite pipeline 5 AM paging regression",
"logs": "[2026-04-16 05:07] worker-3 OOM killed\n...",
"source_paths": [
"src/worker.ts",
"memory/sprite-foundry-visual-mastery.md"
]
}
}
Artifact
~/.ollama-intern/artifacts/incident/
2026-04-16-sprite-pipeline-5-am-paging-regression.md
2026-04-16-sprite-pipeline-5-am-paging-regression.json
# headings, evidence block with cited ids,
# investigative next_checks, weak: false.
# deterministic renderer — not a prompt.
Envelope
{
"tier_used": "deep",
"model": "hermes3:8b",
"hardware_profile": "dev-rtx5080",
"tokens_in": 4180, "tokens_out": 612,
"elapsed_ms": 8410,
"residency": { "in_vram": true, "evicted": false }
}
The shape — four tiers, 42 tools
Job-shaped, not model-shaped. Pick the job; the tier follows.
Atoms · 28
Primitives. The original 15 — classify, extract, triage_logs, summarize_fast/deep, draft, research, corpus_* (search/answer/index/refresh/list), embed(_search), chat — plus 13 added in the v2.1.0 feature pass: doctor, log_tail, batch_proof_check, code_map, code_citation, multi_file_refactor_propose, refactor_plan, artifact_prune, hypothesis_drill, corpus_health/amend/amend_history/rerank.
Briefs · 3
Evidence-first operator briefs. Every claim cites an evidence id. Unknowns stripped server-side. Weak evidence flags weak: true rather than smoothing fake narrative.
Packs · 3
Fixed-pipeline compound jobs. incident_pack, repo_pack, change_pack run a deterministic sequence and write durable markdown + JSON to ~/.ollama-intern/artifacts/. Not a transcript — a filing cabinet.
Artifacts · 7
Continuity surface over pack outputs. list, read, diff, export_to_path, plus three deterministic snippet helpers for incident notes, onboarding sections, release notes. No model calls in this tier.
Laws, enforced server-side
Not prompt conventions. Code.
Evidence-first
Every brief claim cites an evidence id. Unknown ids are stripped server-side before the result returns.
Investigative, not prescriptive
next_checks, read_next, likely_breakpoints only. Prompts explicitly forbid "apply this fix." No remediation drift.
Weak is weak
Thin evidence flags weak: true with coverage notes. Never smoothed into fake narrative.
Every call shows its work
Uniform envelope: tier_used, model, hardware_profile, tokens_in/out, elapsed_ms, residency from /api/ps. NDJSON log at ~/.ollama-intern/log.ndjson.
Scale up — optional Ollama Cloud
Off by default, zero egress until you opt in. Local-first stays the promise.
Cloud-primary, local-fallback
Set OLLAMA_CLOUD_PRIMARY=1 + OLLAMA_API_KEY and the generative tiers route to a 600B-class model (default minimax-m3:cloud). A circuit breaker falls back to your local profile on any cloud failure; embeddings always stay local.
Never a silent downgrade
Every envelope reports backend (cloud|local), degraded, and degrade_reason — plus a backend_fallback NDJSON event — so you always know when you got the local model instead of the big one. A bad key surfaces loudly, never a silent degrade.
Opt-in by design
Nothing leaves the box unless you set both vars — the local-first, zero-egress promise holds for everyone else. See the Ollama Cloud handbook page for setup, env vars, and the privacy posture.
One call, one artifact
The call
// Claude → ollama-intern-mcp
{
"tool": "ollama_incident_pack",
"arguments": {
"title": "5 AM paging regression",
"logs": "[05:07] worker-3 OOM killed\n[05:07] /api/ps evicted=true size=8.1GB\n...",
"source_paths": [
"src/worker.ts",
"memory/sprite-foundry-visual-mastery.md"
]
}
} The artifact (deterministic)
# Incident — 5 AM paging regression
slug: 2026-04-16-5-am-paging-regression
weak: false · evidence_count: 6
## Evidence
- e1: src/worker.ts:148–162 (OOM path)
- e2: log excerpt 05:07 (residency.evicted=true)
- ...
## Next checks
- residency.evicted across last 24h
- OLLAMA_MAX_LOADED_MODELS vs loaded size
## Read next
- src/worker.ts:worker_loop
- docs/ops/ollama-paging.md Install
npm
npm install -g ollama-intern-mcp Claude Code
{
"mcpServers": {
"ollama-intern": {
"command": "npx",
"args": ["-y", "ollama-intern-mcp"],
"env": {
"OLLAMA_HOST": "http://127.0.0.1:11434",
"INTERN_PROFILE": "dev-rtx5080"
}
}
}
} Ollama Cloud (optional — off by default)
// Add to the env block to route the generative tiers
// to a 600B-class cloud model; local stays the fallback.
// Zero egress until BOTH of these are set.
"env": {
"OLLAMA_CLOUD_PRIMARY": "1",
"OLLAMA_API_KEY": "sk-...your-key...",
"INTERN_PROFILE": "dev-rtx5080"
}
// Key from https://ollama.com/settings/keys (a runtime env
// var, not a CI secret). See handbook/ollama-cloud/. Hermes Agent (validated path)
# Nous Research Hermes Agent + hermes3:8b on Ollama
# Validated end-to-end 2026-04-19 — full config in
# hermes.config.example.yaml in the repo root.
mcp_servers:
ollama-intern:
command: npx
args: ["-y", "ollama-intern-mcp"]
env:
OLLAMA_HOST: http://localhost:11434
INTERN_PROFILE: dev-rtx5080
# See: handbook/with-hermes/ for the full walkthrough. Model pulls (dev-rtx5080 — default hermes3:8b ladder)
ollama pull hermes3:8b
ollama pull nomic-embed-text
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_KEEP_ALIVE=-1