Getting Started
Ollama Intern MCP is a local-only MCP server. Claude Code calls it; it routes work onto a local Ollama model; the result lands in a durable artifact on disk.
This page takes you from zero to one real tool call in about five minutes.
1. Prerequisites
Section titled “1. Prerequisites”- Node.js 18 or newer (20 LTS recommended — matches CI).
- Ollama installed and running at
http://127.0.0.1:11434. - Claude Code (or any MCP-capable client).
Hardware minimums
Section titled “Hardware minimums”The models this server drives run on your machine — you need enough VRAM (or system RAM for CPU inference) to load them. Ballpark figures, honest:
| Model | Role | VRAM (GPU) | RAM (CPU fallback) |
|---|---|---|---|
hermes3:8b | Default workhorse — Instant / Workhorse / Deep | ~6 GB | ~16 GB |
nomic-embed-text | Embed tier | ~500 MB | ~1 GB |
See Ollama’s hardware notes for the full picture — quantization, context length, and concurrent loaded models all move the number.
Profile hints:
dev-rtx5080— tested on RTX 5080 (16 GB VRAM). Default.dev-rtx5080-qwen3— same hardware, Qwen 3 alternate rail.m5-max— tuned for M5 Max MacBook Pro (128 GB unified memory); swaps the ladder to heavier Qwen 3 tiers.
Alternate tiers live in each profile file under src/profiles/. If hermes3:8b can’t fit, drop to a smaller Ollama model or use a lighter profile — ollama ps will show what’s actually resident.
2. Install
Section titled “2. Install”Most users do not install globally. The recommended path is the Claude Code MCP config block below, which runs the server on demand via npx.
If you want the binary on your PATH for ad-hoc use, you can still install it globally:
npm install -g ollama-intern-mcp3. Wire into Claude Code
Section titled “3. Wire into Claude Code”Add this block to your Claude Code MCP server config:
{ "mcpServers": { "ollama-intern": { "command": "npx", "args": ["-y", "ollama-intern-mcp"], "env": { "OLLAMA_HOST": "http://127.0.0.1:11434", "INTERN_PROFILE": "dev-rtx5080" } } }}The INTERN_PROFILE picks a hardware profile — see Envelope & tiers for the full table. dev-rtx5080 is the default developer profile and runs the validated hermes3:8b ladder.
Claude Desktop
Section titled “Claude Desktop”Same block, written to:
- macOS —
~/Library/Application Support/Claude/claude_desktop_config.json - Windows —
%APPDATA%\Claude\claude_desktop_config.json
4. Pull the tier models
Section titled “4. Pull the tier models”The default dev-rtx5080 profile collapses all three work tiers (Instant / Workhorse / Deep) onto hermes3:8b, plus nomic-embed-text for the Embed tier. One pull covers everything:
ollama pull hermes3:8bollama pull nomic-embed-textexport OLLAMA_MAX_LOADED_MODELS=2export OLLAMA_KEEP_ALIVE=-1Four tiers, top to bottom:
| Tier | Default model | Used by |
|---|---|---|
| Instant | hermes3:8b | classify, extract, triage_logs |
| Workhorse | hermes3:8b | summarize_fast, draft, briefs |
| Deep | hermes3:8b | summarize_deep, research, packs |
| Embed | nomic-embed-text | embed, embed_search, corpus tools |
Other profiles (dev-rtx5080-qwen3, m5-max) swap the ladder — see Envelope & tiers.
5. Hello, intern
Section titled “5. Hello, intern”Restart Claude Code so it re-reads the MCP config, then ask it to run the smallest possible call:
Use the ollama-intern ollama_classify tool to classify"the build failed because the API returned 502" into one of["infra", "code", "user-error"]. Show me the envelope.You should get back a uniform envelope like this:
{ "result": { "label": "infra", "confidence": 0.9 }, "tier_used": "instant", "model": "hermes3:8b", "hardware_profile": "dev-rtx5080", "tokens_in": 42, "tokens_out": 8, "elapsed_ms": 380, "residency": { "in_vram": true, "evicted": false }}That is the intern working. Every tool in the server returns this same envelope shape. Every call is appended as one NDJSON line to ~/.ollama-intern/log.ndjson.
Next steps
Section titled “Next steps”- Tool reference — all 28 tools grouped by tier
- Artifacts & continuity — how packs write durable markdown to
~/.ollama-intern/artifacts/ - Laws & guardrails — evidence-first, weak-is-weak, deterministic renderers
- Use with Hermes — drive the MCP from Hermes Agent on
hermes3:8b