Voice DSP that behaves like software, not folklore.
Streaming TTS inference backend with Kokoro/Piper synthesis, expressive prosody controls, and deterministic output. Powers mcp-voice-soundboard.
Install
npm i && npm run build
Test
npm test
npm run test:meaning
npm run test:determinism
Bench
npm run bench:rtf
npm run smoke
Core Capabilities
Built for stability and reproducibility — the two places most voice DSP systems fail.
Deterministic Output
Same input + config + chunking policy produces identical output every time. Regression-protected via hash-based tests — not "it sounds about right".
Streaming-First Runtime
Stateful, causal processing designed for low latency. No retroactive edits. Snapshot/restore for persistence and resumability across connections.
Expressive Prosody
Event-driven accents and boundary tones shape cadence and intonation intentionally. Meaning tests enforce accent locality, question vs statement boundaries, and post-focus compression.
Meaning Tests
The test suite enforces communicative behavior — not just "does it run".
Packages
Monorepo — one package today, clean separation for future synthesis backends.
Quick Start
Build & test
git clone https://github.com/mcp-tool-shop-org/mcp-voice-engine
cd mcp-voice-engine
npm i
npm run build
# Full test suite
npm test
# Specific suites
npm run test:meaning # communicative behavior
npm run test:determinism # hash regression tests
npm run bench:rtf # real-time factor benchmark
npm run smoke # end-to-end smoke test Key docs
packages/voice-engine-dsp/docs/
├── STREAMING_ARCHITECTURE.md # causal processing model
├── MEANING_CONTRACT.md # prosody behavior spec
└── DEBUGGING.md # debugging guide
Reference_Handbook.md # full API + concepts reference
PERF_CONTRACT.md # RTF and latency guarantees