Architecture
Soundboard Plugin sits between Claude Code and the voice-soundboard TTS engine. It receives text via stdio JSON-RPC, processes it through a multi-stage speech pipeline, and hands synthesised audio to a playback worker.
Pipeline overview
Section titled “Pipeline overview”Claude Code | stdio (JSON-RPC) vstdio_bridge ---- security/guardrails | concurrency gate | rate limiter | structured errors vspeech pipeline |-- chunking smart sentence splitting |-- ssml_lite safe SSML subset parser |-- emotion/ 8 emotions + voice routing |-- dialogue/ multi-speaker parser + casting |-- sfx_parser <ding>/<chime> WAV generation |-- orchestrator multi-chunk synthesis loop |-- concat WAV concatenation vvoice-soundboard engine |-- Kokoro (local, default) |-- Piper / OpenAI / Azure / ElevenLabs vplayback/worker ---- single-thread queue |-- 30s watchdog timer |-- interrupt / enqueue / drop policies |-- retention (auto-cleanup) vPCM audio -> speakersComponents
Section titled “Components”stdio bridge
Section titled “stdio bridge”The entry point. Receives JSON-RPC messages from Claude Code over stdin, dispatches to the appropriate tool handler, and returns results over stdout. Includes health checks and graceful shutdown.
Security and guardrails
Section titled “Security and guardrails”All requests pass through the security layer before reaching the speech pipeline:
- Concurrency gate — semaphore limits synthesis to one request at a time. Prevents resource exhaustion.
- Rate limiter — configurable cooldown between requests. Disabled by default.
- Input validation — 10,000 character maximum, speed clamped to 0.5-2.0x, chunk and line limits enforced.
- Structured errors — every error returns a JSON object with code, message, hint, and trace ID. No stack traces leak to the client.
Speech pipeline
Section titled “Speech pipeline”The pipeline processes text through multiple stages:
- Chunking (
chunking.py) — splits long text at sentence boundaries. Respects chunk size limits while keeping sentences intact. - SSML-lite parser (
ssml_lite.py) — parses a safe subset of SSML:<break>,<emphasis>,<prosody>(pitch and rate). Rejects unknown tags. - Emotion routing (
emotion/) — detects emotion from text context and maps to voice parameters. Supports 8 emotions with per-voice tuning. - Dialogue parser (
dialogue/) — identifies speakers in dialogue markup, auto-casts characters to voices, and applies stage directions and speed modifiers. - SFX parser (
sfx_parser.py) — recognises inline tags like<ding>and<chime>, generates pure-Python WAV tones at the correct position. - Orchestrator (
orchestrator.py) — coordinates multi-chunk synthesis. Sends chunks to the voice engine sequentially, collects results. - Concatenation (
concat.py) — joins multiple WAV segments into a single output file with correct headers.
Voice-soundboard engine
Section titled “Voice-soundboard engine”The synthesis backend. Default is Kokoro for fully local inference. The plugin supports multiple backends (Piper, OpenAI, Azure, ElevenLabs) but only Kokoro runs without network access.
Playback worker
Section titled “Playback worker”Single-threaded audio worker with queue management:
- Queue policies — replace (stop current, play latest), queue (play sequentially), drop (ignore new requests while busy).
- Watchdog — 30-second timer kills stuck playback.
- Retention — auto-deletes WAV files older than the configured retention window (default 240 minutes).
Security properties
Section titled “Security properties”The plugin runs entirely on your machine. No network calls, no telemetry, no cloud APIs unless you explicitly configure a remote voice backend.
| Property | Implementation |
|---|---|
| Input bounds | 10,000 char max, clamped speed (0.5-2.0x), chunk and line limits |
| Voice allowlist | 12 pre-approved voices, unknown IDs rejected with structured error |
| Path sandboxing | WAV output confined to {tempdir}/voice-soundboard/ |
| Concurrency | Single synthesis at a time (semaphore gate) |
| Error safety | Structured JSON errors with trace IDs, no stack traces to client |
| Secret redaction | Paths, tokens, IPs, base64, key=value stripped from logs |
| WAV validation | RIFF/WAVE magic bytes + minimum size check on every output file |
For the full security policy and STRIDE-lite threat model, see the repository’s SECURITY.md and docs/SECURITY_THREAT_MODEL.md.
Project structure
Section titled “Project structure”soundboard-plugin/ voice_soundboard_plugin/ bridge/ MCP stdio server + health checks speech/ TTS pipeline (chunking, SSML, orchestrator, concat) dialogue/ Multi-speaker parser + auto-casting emotion/ Emotion detection + voice routing playback/ Single-thread worker + retention ambient/ Inner monologue subsystem security/ Guardrails, fs sandbox, redaction, WAV validation audio/ Audio utilities tests/ 326 tests (unit + integration + security battery) scripts/ ship_gate.py pre-release script docs/ Threat model, privacy policy, release checklist