Local-first · cloud-optional · evidence-first · Hermes-ready · MIT

The local intern for Claude Code.

42 job-shaped tools across four tiers — atoms, briefs, packs, artifacts. Claude picks the tool, the tool picks the tier, the tier writes a file you can open next week. Local-first and zero-egress by default, with optional Ollama Cloud routing for 600B-class models when local hardware is the bottleneck. No telemetry. Every call shows its work.

See a pack run Read the Handbook

Call

// Claude → ollama-intern-mcp { "tool": "ollama_incident_pack", "arguments": { "title": "sprite pipeline 5 AM paging regression", "logs": "[2026-04-16 05:07] worker-3 OOM killed\n...", "source_paths": [ "src/worker.ts", "memory/sprite-foundry-visual-mastery.md" ] } }

Artifact

~/.ollama-intern/artifacts/incident/ 2026-04-16-sprite-pipeline-5-am-paging-regression.md 2026-04-16-sprite-pipeline-5-am-paging-regression.json # headings, evidence block with cited ids, # investigative next_checks, weak: false. # deterministic renderer — not a prompt.

Envelope

{ "tier_used": "deep", "model": "hermes3:8b", "hardware_profile": "dev-rtx5080", "tokens_in": 4180, "tokens_out": 612, "elapsed_ms": 8410, "residency": { "in_vram": true, "evicted": false } }

The shape — four tiers, 42 tools

Job-shaped, not model-shaped. Pick the job; the tier follows.

Atoms · 28

Primitives. The original 15 — classify, extract, triage_logs, summarize_fast/deep, draft, research, corpus_* (search/answer/index/refresh/list), embed(_search), chat — plus 13 added in the v2.1.0 feature pass: doctor, log_tail, batch_proof_check, code_map, code_citation, multi_file_refactor_propose, refactor_plan, artifact_prune, hypothesis_drill, corpus_health/amend/amend_history/rerank.

Briefs · 3

Evidence-first operator briefs. Every claim cites an evidence id. Unknowns stripped server-side. Weak evidence flags weak: true rather than smoothing fake narrative.

Packs · 3

Fixed-pipeline compound jobs. incident_pack, repo_pack, change_pack run a deterministic sequence and write durable markdown + JSON to ~/.ollama-intern/artifacts/. Not a transcript — a filing cabinet.

Artifacts · 7

Continuity surface over pack outputs. list, read, diff, export_to_path, plus three deterministic snippet helpers for incident notes, onboarding sections, release notes. No model calls in this tier.

Laws, enforced server-side

Not prompt conventions. Code.

Evidence-first

Every brief claim cites an evidence id. Unknown ids are stripped server-side before the result returns.

Investigative, not prescriptive

next_checks, read_next, likely_breakpoints only. Prompts explicitly forbid "apply this fix." No remediation drift.

Weak is weak

Thin evidence flags weak: true with coverage notes. Never smoothed into fake narrative.

Every call shows its work

Uniform envelope: tier_used, model, hardware_profile, tokens_in/out, elapsed_ms, residency from /api/ps. NDJSON log at ~/.ollama-intern/log.ndjson.

Scale up — optional Ollama Cloud

Off by default, zero egress until you opt in. Local-first stays the promise.

Cloud-primary, local-fallback

Set OLLAMA_CLOUD_PRIMARY=1 + OLLAMA_API_KEY and the generative tiers route to a 600B-class model (default minimax-m3:cloud). A circuit breaker falls back to your local profile on any cloud failure; embeddings always stay local.

Never a silent downgrade

Every envelope reports backend (cloud|local), degraded, and degrade_reason — plus a backend_fallback NDJSON event — so you always know when you got the local model instead of the big one. A bad key surfaces loudly, never a silent degrade.

Opt-in by design

Nothing leaves the box unless you set both vars — the local-first, zero-egress promise holds for everyone else. See the Ollama Cloud handbook page for setup, env vars, and the privacy posture.

One call, one artifact

The call

// Claude → ollama-intern-mcp
{
  "tool": "ollama_incident_pack",
  "arguments": {
    "title": "5 AM paging regression",
    "logs": "[05:07] worker-3 OOM killed\n[05:07] /api/ps evicted=true size=8.1GB\n...",
    "source_paths": [
      "src/worker.ts",
      "memory/sprite-foundry-visual-mastery.md"
    ]
  }
}

The artifact (deterministic)

# Incident — 5 AM paging regression
slug: 2026-04-16-5-am-paging-regression
weak: false · evidence_count: 6

## Evidence
- e1: src/worker.ts:148–162 (OOM path)
- e2: log excerpt 05:07 (residency.evicted=true)
- ...

## Next checks
- residency.evicted across last 24h
- OLLAMA_MAX_LOADED_MODELS vs loaded size

## Read next
- src/worker.ts:worker_loop
- docs/ops/ollama-paging.md

Install

npm

npm install -g ollama-intern-mcp

Claude Code

{
  "mcpServers": {
    "ollama-intern": {
      "command": "npx",
      "args": ["-y", "ollama-intern-mcp"],
      "env": {
        "OLLAMA_HOST": "http://127.0.0.1:11434",
        "INTERN_PROFILE": "dev-rtx5080"
      }
    }
  }
}

Ollama Cloud (optional — off by default)

// Add to the env block to route the generative tiers
// to a 600B-class cloud model; local stays the fallback.
// Zero egress until BOTH of these are set.
"env": {
  "OLLAMA_CLOUD_PRIMARY": "1",
  "OLLAMA_API_KEY": "sk-...your-key...",
  "INTERN_PROFILE": "dev-rtx5080"
}
// Key from https://ollama.com/settings/keys (a runtime env
// var, not a CI secret). See handbook/ollama-cloud/.

Hermes Agent (validated path)

# Nous Research Hermes Agent + hermes3:8b on Ollama
# Validated end-to-end 2026-04-19 — full config in
# hermes.config.example.yaml in the repo root.
mcp_servers:
  ollama-intern:
    command: npx
    args: ["-y", "ollama-intern-mcp"]
    env:
      OLLAMA_HOST: http://localhost:11434
      INTERN_PROFILE: dev-rtx5080
# See: handbook/with-hermes/ for the full walkthrough.

Model pulls (dev-rtx5080 — default hermes3:8b ladder)

ollama pull hermes3:8b
ollama pull nomic-embed-text
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_KEEP_ALIVE=-1