Engines
Original Voice Soundboard supports three TTS engines. Kokoro ships by default. The others are optional installs that unlock additional capabilities.
Engine overview
Section titled “Engine overview”| Engine | Install command | What it adds |
|---|---|---|
| Kokoro (default) | pip install voice-soundboard | 54+ voices, 19 emotions, voice presets |
| Chatterbox | pip install voice-soundboard[chatterbox] | Paralinguistic tags, 23 languages, emotion exaggeration |
| F5-TTS | pip install voice-soundboard[f5tts] | Zero-shot voice cloning from 3-10 second audio samples |
Kokoro (default)
Section titled “Kokoro (default)”Kokoro is the primary engine. It provides the full voice library, emotion system, and preset infrastructure.
from voice_soundboard import VoiceEngine
engine = VoiceEngine() # Uses Kokoro by defaultresult = engine.speak("Hello!", voice="af_bella", emotion="happy")Kokoro runs via ONNX Runtime, so it works on CPU out of the box. A CUDA GPU speeds things up but is not required.
Chatterbox
Section titled “Chatterbox”Chatterbox adds paralinguistic tags and multilingual support.
pip install voice-soundboard[chatterbox]Paralinguistic tags let you embed non-verbal sounds directly in the text:
engine = VoiceEngine(engine="chatterbox")result = engine.speak("That's hilarious [laugh] I can't stop [laugh]")result = engine.speak("I don't know [sigh] it's been a long day")Chatterbox supports 23 languages and provides an emotion exaggeration parameter for more dramatic delivery.
F5-TTS
Section titled “F5-TTS”F5-TTS is a Diffusion Transformer model that enables zero-shot voice cloning. Give it a 3-10 second audio sample of any voice, and it can synthesize new speech in that voice.
pip install voice-soundboard[f5tts]engine = VoiceEngine(engine="f5tts")result = engine.speak( "This is my cloned voice speaking.", reference_audio="path/to/sample.wav")Voice cloning requires explicit consent acknowledgment. The library enforces this to prevent misuse.
All installation extras
Section titled “All installation extras”pip install voice-soundboard # Core (Kokoro engine)pip install voice-soundboard[mcp] # + MCP server for AI agentspip install voice-soundboard[chatterbox] # + Paralinguistic tags & 23 languagespip install voice-soundboard[f5tts] # + F5-TTS voice cloningpip install voice-soundboard[websocket] # + WebSocket serverpip install voice-soundboard[web] # + Mobile web UIpip install voice-soundboard[all] # EverythingAdvanced topics
Section titled “Advanced topics”Voice cloning
Section titled “Voice cloning”F5-TTS can clone a voice from a short audio sample (3-10 seconds). The sample should be clean speech with minimal background noise. See the voice cloning example for a working script.
Multi-speaker dialogue
Section titled “Multi-speaker dialogue”Generate conversations with multiple voices, each with their own emotion and style. Script a dialogue and assign per-character voice settings:
dialogue = [ {"speaker": "af_bella", "emotion": "happy", "text": "Good morning!"}, {"speaker": "bm_george", "emotion": "calm", "text": "Morning. Coffee?"}, {"speaker": "af_bella", "emotion": "excited", "text": "Yes please!"},]See the multi-speaker example for the full implementation.
SSML support
Section titled “SSML support”Speech Synthesis Markup Language gives you fine-grained control over pauses, emphasis, and prosody. The SSML parser uses defusedxml for protection against XML-based attacks.
<speak> <s>Welcome to <emphasis level="strong">Original Voice Soundboard</emphasis>.</s> <break time="500ms"/> <s>Let's get started.</s></speak>WebSocket server
Section titled “WebSocket server”For real-time bidirectional communication, install the WebSocket extra:
pip install voice-soundboard[websocket]This enables streaming audio generation over WebSocket connections, suitable for interactive applications with low-latency requirements.