Skip to content

Quickstart — Your first 5 minutes

This is the shortest path from “nothing installed” to “an artifact on disk you can read.” End-to-end target: five minutes, assuming you already have Ollama running.

If you want the full prerequisite walkthrough (hardware sizing, profile picks, Claude Desktop config), see Getting Started — this page is the compressed version for someone who just wants to see it work.


Step 1 — Make sure Ollama is alive (30 seconds)

Section titled “Step 1 — Make sure Ollama is alive (30 seconds)”
Terminal window
# Is the daemon listening?
curl -sS http://127.0.0.1:11434/api/tags | head -c 80

Expected: a JSON response. If you get Connection refused, start the daemon:

Terminal window
ollama serve # foreground; or:
brew services start ollama # macOS

Pull the two models the default profile uses:

Terminal window
ollama pull hermes3:8b
ollama pull nomic-embed-text

These are the dev-rtx5080 ladder. If you’re on a Mac with the M5 Max, substitute INTERN_PROFILE=m5-max later — it swaps the ladder to heavier Qwen 3 tiers.


Step 2 — Wire ollama-intern-mcp into Claude Code (1 minute)

Section titled “Step 2 — Wire ollama-intern-mcp into Claude Code (1 minute)”

Add this block to your Claude Code MCP server config (no global install needed — npx fetches and runs the server on demand):

{
"mcpServers": {
"ollama-intern": {
"command": "npx",
"args": ["-y", "ollama-intern-mcp"],
"env": {
"OLLAMA_HOST": "http://127.0.0.1:11434",
"INTERN_PROFILE": "dev-rtx5080"
}
}
}
}

Restart Claude Code so it re-reads the config.


Step 3 — Run ollama_doctor (the smallest possible call) (30 seconds)

Section titled “Step 3 — Run ollama_doctor (the smallest possible call) (30 seconds)”

In Claude Code, ask:

Use the ollama-intern ollama_doctor tool. Show me the envelope.

You should see:

{
"result": {
"ollama": { "reachable": true, "host": "http://127.0.0.1:11434" },
"models": {
"required": ["hermes3:8b", "nomic-embed-text"],
"pulled": ["hermes3:8b", "nomic-embed-text", "..."],
"loaded": [],
"missing": []
},
"profile": { "name": "dev-rtx5080", "tiers": { "instant": "hermes3:8b", "...": "..." } },
"healthy": true
},
"tier_used": "instant",
"model": "hermes3:8b",
"elapsed_ms": 12,
"...": "..."
}

healthy: true means the box is wired correctly. If anything is wrong, doctor tells you exactly what’s missing — go fix it before continuing. This is the call you make on every session start.

If you see healthy: false: check models.missing first. Most “doctor says unhealthy” reports are an ollama pull you forgot.


Step 4 — Build a corpus from a directory (1 minute)

Section titled “Step 4 — Build a corpus from a directory (1 minute)”

Pick a small docs directory you have on disk — even a folder of three markdown files works. Then ask Claude Code:

Use ollama_corpus_index on /path/to/your/docs. Name the corpus myfirst.

You’ll get back an envelope like:

{
"result": {
"corpus_id": "myfirst",
"chunks_indexed": 47,
"embed_model": "nomic-embed-text",
"duration_ms": 8120
},
"tier_used": "embed",
"...": "..."
}

The corpus and its chunk store now live under ~/.ollama-intern/corpora/myfirst/. It survives reboots; you only re-index when the underlying files change (and there’s corpus_refresh for that).


Step 5 — Ask the corpus a question (1 minute)

Section titled “Step 5 — Ask the corpus a question (1 minute)”

Use ollama_corpus_answer on the myfirst corpus. Question: “What does this project do?”

The envelope comes back with a synthesized answer plus per-claim citations that point at the chunk ids the model actually grounded on:

{
"result": {
"answer": "...",
"citations": [
{ "chunk_id": "abc123", "source_path": "/path/to/docs/intro.md", "score": 0.81 },
{ "chunk_id": "def456", "source_path": "/path/to/docs/usage.md", "score": 0.74 }
],
"weak": false,
"abstained": false
},
"tier_used": "deep",
"...": "..."
}

If the corpus didn’t address your question, the tool returns abstained: true with an empty answer rather than smoothing a fake narrative — that’s the abstention contract at work.


Step 6 — Run an incident_pack and see the artifact on disk (1 minute)

Section titled “Step 6 — Run an incident_pack and see the artifact on disk (1 minute)”

Briefs and atoms return JSON to Claude. Packs go further: they write a durable artifact to disk you can open later. Try:

Use ollama_incident_pack. Title: “first run”. Logs: “[05:07] worker-3 OOM killed”. source_paths: [].

After the pack runs, look on disk:

Terminal window
ls ~/.ollama-intern/artifacts/incident/
# 2026-05-15-first-run.md
# 2026-05-15-first-run.json

Open the .md file. That’s your artifact — a deterministic-rendered markdown report with a citations block, a weak flag, and next_checks the model wants you to investigate. Code rendered it, not a prompt. It will look exactly the same shape every time.


In ~5 minutes you ran a tool from each tier:

TierToolWhat it produced
Atomollama_doctorStatus snapshot (healthy: true)
Atomollama_corpus_indexA persistent corpus on disk
Atomollama_corpus_answerA grounded answer with citations
Packollama_incident_packA durable markdown artifact

You can now:

  • List artifacts: ollama_artifact_list
  • Re-read one without re-running the pack: ollama_artifact_read
  • Diff two artifacts of the same pack: ollama_artifact_diff
  • Tail the NDJSON log: ollama_log_tail (or tail -f ~/.ollama-intern/log.ndjson)

healthy: false from doctor with models.missing populated. Run ollama pull <model> for each missing model. The default dev-rtx5080 profile needs hermes3:8b + nomic-embed-text.

SCHEMA_INVALID on a tool call. You passed an arg the schema didn’t accept. The error details field tells you the field name and the expected shape — fix and retry.

PATH_NOT_ALLOWED. A tool tried to read or write outside INTERN_ALLOWED_ROOTS. Either the path is wrong, or you need to extend the allow-list. See Security.

Empty citations[] with abstained: true. Not a bug — the corpus or the source files didn’t address your question well enough. Either expand the corpus, supply better source_paths, or accept the abstention.

Long elapsed_ms on the first call. Cold-start prewarm cost. Subsequent calls in the same tier reuse the resident model.