Voice Soundboard TTS for AI agents.
A text-to-speech engine that separates what is said from how it’s rendered. Compiler → Graph → Engine architecture with swappable backends.
Install
pip install voice-soundboard
Speak
engine.speak('Hello world!')
Emotion
engine.speak('Great news!', emotion='excited')
Features
Everything you need for production TTS.
Compiler / Graph / Engine
Intent compiles to a ControlGraph. The engine renders it to audio. Features are free at runtime.
Swappable Backends
Kokoro (GPU), Piper (CPU), OpenAI, ElevenLabs, Azure — switch without changing code.
Emotions & Styles
Add emotion="happy" or style="warmly and cheerfully" — the compiler bakes it into the graph.
Streaming Synthesis
Sentence-level streaming for LLM output. compile_stream() yields graphs as text arrives.
CLI Included
voice-soundboard speak "Hello!" — with presets, voice selection, and speed control.
MCP Server Ready
Built-in MCP adapter so AI agents can synthesize speech through the standard tool protocol.
Usage
Install
# Core library
pip install voice-soundboard
# With Kokoro backend (GPU)
pip install voice-soundboard[kokoro]
# With Piper backend (CPU)
pip install voice-soundboard[piper] Basic
from voice_soundboard import VoiceEngine
engine = VoiceEngine()
result = engine.speak('Hello world!')
print(f'Saved to: {result.audio_path}') With voice & emotion
result = engine.speak(
'Breaking news!',
voice='bm_george',
preset='announcer',
emotion='excited'
) Streaming (LLM output)
from voice_soundboard.compiler import compile_stream
from voice_soundboard.runtime import StreamingSynthesizer
for graph in compile_stream(llm_chunks()):
for chunk in streamer.stream(graph):
play(chunk) Architecture
Compiler → Graph → Engine. Clean separation, zero-cost features.
Compiler
Transforms text + emotion + style into a pure-data ControlGraph. All feature logic lives here.
ControlGraph
Immutable data structure with TokenEvents, SpeakerRefs, and prosody. Versioned for compatibility.
Engine
Transforms graphs into PCM audio. Knows nothing about emotions or styles — only synthesis.
Backends
Choose the right backend for your use case.