Architecture
Design decision: Comedy Sidekick (Mode B)
Section titled “Design decision: Comedy Sidekick (Mode B)”Three options were evaluated:
- A) Personality injection — reshape every LLM output to be funnier. Hardest, highest drift risk.
- B) Comedy sidekick — distinct voice that returns structured comedy on demand. Clean separation.
- C) Comedy toolkit — stateless material generator. Easiest but flattest.
sensor-humor implements Mode B. The host LLM stays unchanged and calls tools when it wants humor. The sidekick returns JSON; the host decides what to do with it.
System diagram
Section titled “System diagram”Host LLM (Claude, Cursor, etc.) | calls MCP tool vsensor-humor MCP server (TypeScript, stdio) | builds prompt from mood + session state + user input vOllama (qwen2.5:7b-instruct, local) | returns JSON (schema-enforced via format param) vsensor-humor validates -> updates session -> returns to host | host optionally calls vmcp-voice-soundboard (Piper backend) | maps mood -> prosody preset -> SynthesisConfig vPiper TTS (local ONNX) -> audio outputKey components
Section titled “Key components”MCP server (src/index.ts)
Section titled “MCP server (src/index.ts)”Registers 8 tools with @modelcontextprotocol/sdk. Runs on stdio transport. Each tool call flows through the same pipeline: resolve mood, build prompt, call Ollama, validate, update session, return.
Session (src/session.ts)
Section titled “Session (src/session.ts)”In-memory singleton. Tracks mood, running gags (tagged setups with usage counts), recent bits (ring buffer, max 20), catchphrases (Map), and turn counter. Dies when server stops — no persistence by design.
Prompt loader (src/prompts/)
Section titled “Prompt loader (src/prompts/)”Versioned system prompts per mood. base.ts provides safety rules, banned patterns, and length constraints. Mood files add voice flavor. Loader resolves SENSOR_HUMOR_PROMPT_VERSION env var with v1 fallback.
Ollama client (src/ollama.ts)
Section titled “Ollama client (src/ollama.ts)”Wraps the Ollama chat API with JSON schema enforcement (format parameter), Zod validation, and 1-retry logic. Inference settings: temperature 0.55, top_p 0.85, top_k 40, mirostat 2, tau 5.0, num_predict varies by tool (30-80 tokens). Default model is qwen2.5:7b, configurable via SENSOR_HUMOR_MODEL. Ollama host defaults to http://127.0.0.1:11434, configurable via OLLAMA_HOST.
Tools (src/tools/)
Section titled “Tools (src/tools/)”Each tool assembles its prompt (base + mood + state summary + user input + technique guidance), calls Ollama, post-validates (banned patterns, length, label presence for roast), and updates session state.
Inference tuning
Section titled “Inference tuning”The comedy quality depends heavily on inference settings, not just prompts:
| Setting | Value | Why |
|---|---|---|
| temperature | 0.55 | Low enough for consistency, high enough to avoid rote repetition |
| mirostat | 2 | Keeps perplexity stable, reduces creativity bursts that lead to metaphor leaks |
| mirostat_tau | 5.0 | Target perplexity — balanced between flat and wild |
| top_p | 0.85 | Nucleus sampling — moderate to complement mirostat |
| top_k | 40 | Limits token candidates per step |
| num_predict | 30-80 | Hard token cap by tool type (catchphrase 30, heckle 40, comic_timing 60, roast 80) |
Why qwen2.5:7b-instruct
Section titled “Why qwen2.5:7b-instruct”- Strong JSON schema adherence (critical for structured comedy output)
- Concise output tendency (comedy needs punch, not prose)
- Good instruction following on ban lists (metaphors, hedging, banned starters)
- Fast inference on consumer hardware (~1-2s per call after warm-up)
- 7B fits comfortably in 16GB VRAM alongside other tools
Session state design
Section titled “Session state design”Comedy has a shelf life. Inside jokes from a 2-hour session shouldn’t persist to tomorrow. The in-memory session is deliberate:
- Ring buffer (max 20) for recent bits prevents unbounded growth
- Turn counter on gags lets callbacks check freshness
- Tag-based matching for callbacks is deterministic (keyword scan, not vibes)
- Catchphrase Map tracks usage count so the most-used phrase wins on callback