OI ollama-intern-mcp
Local · evidence-first · Hermes-ready · MIT

The local intern for Claude Code.

28 job-shaped tools across four tiers — atoms, briefs, packs, artifacts. Claude picks the tool, the tool picks the tier, the tier writes a file you can open next week. v2.0.0 ships a validated Hermes Agent integration path on hermes3:8b. No cloud. No telemetry. Every call shows its work.

Call

// Claude → ollama-intern-mcp { "tool": "ollama_incident_pack", "arguments": { "title": "sprite pipeline 5 AM paging regression", "logs": "[2026-04-16 05:07] worker-3 OOM killed\n...", "source_paths": [ "src/worker.ts", "memory/sprite-foundry-visual-mastery.md" ] } }

Artifact

~/.ollama-intern/artifacts/incident/ 2026-04-16-sprite-pipeline-5-am-paging-regression.md 2026-04-16-sprite-pipeline-5-am-paging-regression.json # headings, evidence block with cited ids, # investigative next_checks, weak: false. # deterministic renderer — not a prompt.

Envelope

{ "tier_used": "deep", "model": "hermes3:8b", "hardware_profile": "dev-rtx5080", "tokens_in": 4180, "tokens_out": 612, "elapsed_ms": 8410, "residency": { "in_vram": true, "evicted": false } }

The shape — four tiers, 28 tools

Job-shaped, not model-shaped. Pick the job; the tier follows.

Atoms · 18

Primitives. classify, extract, triage_logs, summarize_fast/deep, draft, research, corpus_* (search/answer/index/refresh/list), embed(_search), chat. Plus 3 briefs — incident, repo, change — each evidence-backed.

Briefs · 3

Evidence-first operator briefs. Every claim cites an evidence id. Unknowns stripped server-side. Weak evidence flags weak: true rather than smoothing fake narrative.

Packs · 3

Fixed-pipeline compound jobs. incident_pack, repo_pack, change_pack run a deterministic sequence and write durable markdown + JSON to ~/.ollama-intern/artifacts/. Not a transcript — a filing cabinet.

Artifacts · 7

Continuity surface over pack outputs. list, read, diff, export_to_path, plus three deterministic snippet helpers for incident notes, onboarding sections, release notes. No model calls in this tier.

Laws, enforced server-side

Not prompt conventions. Code.

Evidence-first

Every brief claim cites an evidence id. Unknown ids are stripped server-side before the result returns.

Investigative, not prescriptive

next_checks, read_next, likely_breakpoints only. Prompts explicitly forbid "apply this fix." No remediation drift.

Weak is weak

Thin evidence flags weak: true with coverage notes. Never smoothed into fake narrative.

Every call shows its work

Uniform envelope: tier_used, model, hardware_profile, tokens_in/out, elapsed_ms, residency from /api/ps. NDJSON log at ~/.ollama-intern/log.ndjson.

One call, one artifact

The call

// Claude → ollama-intern-mcp
{
  "tool": "ollama_incident_pack",
  "arguments": {
    "title": "5 AM paging regression",
    "logs": "[05:07] worker-3 OOM killed\n[05:07] /api/ps evicted=true size=8.1GB\n...",
    "source_paths": [
      "src/worker.ts",
      "memory/sprite-foundry-visual-mastery.md"
    ]
  }
}

The artifact (deterministic)

# Incident — 5 AM paging regression
slug: 2026-04-16-5-am-paging-regression
weak: false · evidence_count: 6

## Evidence
- e1: src/worker.ts:148–162 (OOM path)
- e2: log excerpt 05:07 (residency.evicted=true)
- ...

## Next checks
- residency.evicted across last 24h
- OLLAMA_MAX_LOADED_MODELS vs loaded size

## Read next
- src/worker.ts:worker_loop
- docs/ops/ollama-paging.md

Install

npm

npm install -g ollama-intern-mcp

Claude Code

{
  "mcpServers": {
    "ollama-intern": {
      "command": "npx",
      "args": ["-y", "ollama-intern-mcp"],
      "env": {
        "OLLAMA_HOST": "http://127.0.0.1:11434",
        "INTERN_PROFILE": "dev-rtx5080"
      }
    }
  }
}

Hermes Agent (v2.0.0+ validated path)

# Nous Research Hermes Agent + hermes3:8b on Ollama
# Validated end-to-end 2026-04-19 — full config in
# hermes.config.example.yaml in the repo root.
mcp_servers:
  ollama-intern:
    command: npx
    args: ["-y", "ollama-intern-mcp"]
    env:
      OLLAMA_HOST: http://localhost:11434
      INTERN_PROFILE: dev-rtx5080
# See: handbook/with-hermes/ for the full walkthrough.

Model pulls (dev-rtx5080 — default hermes3:8b ladder)

ollama pull hermes3:8b
ollama pull nomic-embed-text
export OLLAMA_MAX_LOADED_MODELS=2
export OLLAMA_KEEP_ALIVE=-1