Troubleshooting
Things that go wrong the first time you install, in the order they tend to bite. Work top to bottom — most of these are one command away from fixed.
Ollama isn’t running
Section titled “Ollama isn’t running”Symptom — every tool call returns a connection error, or the server fails to start with ECONNREFUSED 127.0.0.1:11434.
Check:
curl http://127.0.0.1:11434/api/tagsIf that doesn’t return JSON, Ollama isn’t up.
Fix — start it:
- macOS / Linux — launch the Ollama app, or run
ollama servein a terminal. - Windows — launch the Ollama desktop app; it lives in the system tray.
Then retry the tool call.
Model pull failed
Section titled “Model pull failed”Symptom — ollama pull hermes3:8b stops with a network error, a disk-space error, or hangs at a fixed percentage.
- Network flake — re-run the pull. Ollama resumes from the last completed layer.
- Disk full — check free space;
hermes3:8bis ~4.7 GB pulled, more when unpacked. - Stuck forever —
Ctrl-C, thenollama rm hermes3:8band pull again. Rare, but sometimes a layer corrupts.
Confirm what’s installed:
ollama listYou should see hermes3:8b and nomic-embed-text for the default dev-rtx5080 profile.
Hardware insufficient for hermes3:8b
Section titled “Hardware insufficient for hermes3:8b”Symptom — the model loads but inference is very slow (tens of seconds for short prompts), or residency.evicted: true appears in envelopes, or Ollama reports the model paged to disk.
hermes3:8b wants roughly 6 GB VRAM or 16 GB RAM for CPU inference (see Getting Started → Hardware minimums).
Fallbacks, cheapest to most-work:
-
Lower concurrent models —
export OLLAMA_MAX_LOADED_MODELS=1so the embed model isn’t pinned alongside the workhorse. -
Switch profile — if you’re on an RTX 5080 and want the Qwen rail,
export INTERN_PROFILE=dev-rtx5080-qwen3and pull the Qwen models listed in the README. If you’re on Apple Silicon with enough memory, trym5-max. -
Pin a smaller model per tier — override individual tiers with env vars:
Terminal window export INTERN_TIER_INSTANT=llama3.2:3bexport INTERN_TIER_WORKHORSE=llama3.2:3bSmaller model, lower quality, but it will run.
The Deep tier is the hungriest — if only deep calls stall, try a lighter model there first.
OLLAMA_MODEL_MISSING error
Section titled “OLLAMA_MODEL_MISSING error”Symptom — a tool call returns an envelope with an error pointing at a missing model, e.g. OLLAMA_MODEL_MISSING: hermes3:8b.
The tier picked a model that isn’t pulled on this machine. Either:
- Pull it —
ollama pull hermes3:8b(or whichever model the error names). - Or override the tier — export the matching
INTERN_TIER_*env var to a model you do have.
Confirm with ollama list that the model name matches exactly, including the tag (hermes3:8b, not hermes3).
MCP server not appearing in Claude Code
Section titled “MCP server not appearing in Claude Code”Symptom — you added the config block, restarted Claude Code, and ollama-intern tools don’t show up.
Sanity checks, in order:
- Config file path — Claude Code reads
~/.config/claude-code/mcp.jsonon macOS/Linux or%APPDATA%\Claude\claude_code_mcp.jsonon Windows. Make sure you edited the right one. - JSON is valid — run the file through a JSON linter. A trailing comma will silently drop the whole
mcpServersblock. npxresolves — open a terminal and runnpx -y ollama-intern-mcp --version. If that fails,npmisn’t on your PATH (or Node isn’t installed).- Full restart — quit Claude Code completely and reopen. Some versions don’t pick up MCP config changes on a reload.
- Check logs — Claude Code’s MCP log shows startup errors. On macOS:
~/Library/Logs/Claude/mcp*.log. Search forollama-intern.
If the server starts but tools still don’t appear, run it manually to see its stderr:
npx -y ollama-intern-mcpIt should print a startup banner and block waiting for stdin. If it exits with an error, that error is your real problem.
Still stuck
Section titled “Still stuck”- Error code reference — every structured error the server returns is documented on the Error codes page with cause and next action.
- Every call is logged to
~/.ollama-intern/log.ndjson—tailit while retrying to see what the server actually saw. - Open a GitHub discussion with the envelope (minus any sensitive content), the profile you’re running, and
ollama listoutput.