Skip to content

Architecture

Three options were evaluated:

  • A) Personality injection — reshape every LLM output to be funnier. Hardest, highest drift risk.
  • B) Comedy sidekick — distinct voice that returns structured comedy on demand. Clean separation.
  • C) Comedy toolkit — stateless material generator. Easiest but flattest.

sensor-humor implements Mode B. The host LLM stays unchanged and calls tools when it wants humor. The sidekick returns JSON; the host decides what to do with it.

Host LLM (Claude, Cursor, etc.)
| calls MCP tool
v
sensor-humor MCP server (TypeScript, stdio)
| builds prompt from mood + session state + user input
v
Ollama (qwen2.5:7b-instruct, local)
| returns JSON (schema-enforced via format param)
v
sensor-humor validates -> updates session -> returns to host
| host optionally calls
v
mcp-voice-soundboard (Piper backend)
| maps mood -> prosody preset -> SynthesisConfig
v
Piper TTS (local ONNX) -> audio output

Registers 8 tools with @modelcontextprotocol/sdk. Runs on stdio transport. Each tool call flows through the same pipeline: resolve mood, build prompt, call Ollama, validate, update session, return.

In-memory singleton. Tracks mood, running gags (tagged setups with usage counts), recent bits (ring buffer, max 20), catchphrases (Map), and turn counter. Dies when server stops — no persistence by design.

Versioned system prompts per mood. base.ts provides safety rules, banned patterns, and length constraints. Mood files add voice flavor. Loader resolves SENSOR_HUMOR_PROMPT_VERSION env var with v1 fallback.

Wraps the Ollama chat API with JSON schema enforcement (format parameter), Zod validation, and 1-retry logic. Inference settings: temperature 0.55, top_p 0.85, top_k 40, mirostat 2, tau 5.0, num_predict varies by tool (30-80 tokens). Default model is qwen2.5:7b, configurable via SENSOR_HUMOR_MODEL. Ollama host defaults to http://127.0.0.1:11434, configurable via OLLAMA_HOST.

Each tool assembles its prompt (base + mood + state summary + user input + technique guidance), calls Ollama, post-validates (banned patterns, length, label presence for roast), and updates session state.

The comedy quality depends heavily on inference settings, not just prompts:

SettingValueWhy
temperature0.55Low enough for consistency, high enough to avoid rote repetition
mirostat2Keeps perplexity stable, reduces creativity bursts that lead to metaphor leaks
mirostat_tau5.0Target perplexity — balanced between flat and wild
top_p0.85Nucleus sampling — moderate to complement mirostat
top_k40Limits token candidates per step
num_predict30-80Hard token cap by tool type (catchphrase 30, heckle 40, comic_timing 60, roast 80)
  • Strong JSON schema adherence (critical for structured comedy output)
  • Concise output tendency (comedy needs punch, not prose)
  • Good instruction following on ban lists (metaphors, hedging, banned starters)
  • Fast inference on consumer hardware (~1-2s per call after warm-up)
  • 7B fits comfortably in 16GB VRAM alongside other tools

Comedy has a shelf life. Inside jokes from a 2-hour session shouldn’t persist to tomorrow. The in-memory session is deliberate:

  • Ring buffer (max 20) for recent bits prevents unbounded growth
  • Turn counter on gags lets callbacks check freshness
  • Tag-based matching for callbacks is deterministic (keyword scan, not vibes)
  • Catchphrase Map tracks usage count so the most-used phrase wins on callback