Skip to content

ollama_chat

{/* PER-TOOL REFERENCE TEMPLATE Each tool page in this directory follows the same shape: 1. One-line job description (name what intern-job this names) 2. Tier + when-to-use 3. Schema — derived from src/tools/.ts (zod) 4. Example call + envelope 5. Common pitfalls 6. Related tools Keep it under ~200 lines. The full tier-grouped overview lives at ../. */}

ollama_chat is the catch-all for ad-hoc model interaction. It is visibly last-resort by design. If you find yourself reaching for chat more than once a session, a specialty tool is missing — file a feature request rather than rely on chat as your normal entrypoint.

hermes3:8b on the default dev-rtx5080 profile. The Workhorse tier shape budget is ~4–8k tokens per turn (per TierConfig.num_ctx for your profile).

  • One-off prose generation that doesn’t fit any specialty tool
  • Light back-and-forth where structured output would be overkill
  • Bridge calls where a future feature will replace this with a job-shaped tool
  • Classificationollama_classify (gives you confidence + threshold + frame)
  • Extractionollama_extract (schema-constrained JSON, no parse errors)
  • Reading a corpusollama_corpus_answer (chunk-grounded citations)
  • Reading specific filesollama_research (path-grounded citations)
  • Producing an artifact → packs (incident_pack / repo_pack / change_pack)
{
messages: Array<{
role: "system" | "user" | "assistant";
content: string; // min 1 char
}>; // min 1 message
system?: string; // optional preface, merged with any system messages
model?: string; // per-call model override (advanced — use sparingly)
}

Full source: src/tools/chat.ts.

{
"tool": "ollama_chat",
"arguments": {
"messages": [
{ "role": "user", "content": "Summarize this commit message in one sentence: 'feat(corpus): surface :latest drift on refresh'" }
]
}
}

Returns:

{
"result": {
"reply": "Adds drift detection to corpus refresh when the source path uses ':latest' tag resolution."
},
"tier_used": "workhorse",
"model": "hermes3:8b",
"hardware_profile": "dev-rtx5080",
"tokens_in": 38,
"tokens_out": 24,
"elapsed_ms": 920,
"residency": { "in_vram": true, "evicted": false }
}

You’re using chat to extract structured data. Switch to ollama_extract with a schema. chat reply is a free-form string — you have to parse it yourself, and the parse will fail half the time.

You’re using chat to classify. Switch to ollama_classify. It gives you a confidence score, an allow_none escape, and a threshold floor. chat does not.

You’re chaining 5+ chat turns. That’s a sign the job is bigger than chat should handle — consider splitting into a corpus_index + corpus_answer flow, or a repo_brief if you’re trying to characterize something.

You’re hitting context limits. chat runs at the Workhorse tier’s num_ctx. For longer-form prose, reach for ollama_summarize_deep (Deep tier, larger window).