Engines

Original Voice Soundboard supports three TTS engines. Kokoro ships by default. The others are optional installs that unlock additional capabilities.

Engine overview

Engine	Install command	What it adds
Kokoro (default)	`pip install voice-soundboard`	54+ voices, 19 emotions, voice presets
Chatterbox	`pip install voice-soundboard[chatterbox]`	Paralinguistic tags, 23 languages, emotion exaggeration
F5-TTS	`pip install voice-soundboard[f5tts]`	Zero-shot voice cloning from 3-10 second audio samples

Kokoro (default)

Kokoro is the primary engine. It provides the full voice library, emotion system, and preset infrastructure.

from voice_soundboard import VoiceEngine

engine = VoiceEngine()  # Uses Kokoro by default
result = engine.speak("Hello!", voice="af_bella", emotion="happy")

Kokoro runs via ONNX Runtime, so it works on CPU out of the box. A CUDA GPU speeds things up but is not required.

Chatterbox

Chatterbox adds paralinguistic tags and multilingual support.

pip install voice-soundboard[chatterbox]

Paralinguistic tags let you embed non-verbal sounds directly in the text:

engine = VoiceEngine(engine="chatterbox")
result = engine.speak("That's hilarious [laugh] I can't stop [laugh]")
result = engine.speak("I don't know [sigh] it's been a long day")

Chatterbox supports 23 languages and provides an emotion exaggeration parameter for more dramatic delivery.

F5-TTS

F5-TTS is a Diffusion Transformer model that enables zero-shot voice cloning. Give it a 3-10 second audio sample of any voice, and it can synthesize new speech in that voice.

pip install voice-soundboard[f5tts]

engine = VoiceEngine(engine="f5tts")
result = engine.speak(
    "This is my cloned voice speaking.",
    reference_audio="path/to/sample.wav"
)

Voice cloning requires explicit consent acknowledgment. The library enforces this to prevent misuse.

All installation extras

pip install voice-soundboard              # Core (Kokoro engine)
pip install voice-soundboard[mcp]         # + MCP server for AI agents
pip install voice-soundboard[chatterbox]  # + Paralinguistic tags & 23 languages
pip install voice-soundboard[f5tts]       # + F5-TTS voice cloning
pip install voice-soundboard[websocket]   # + WebSocket server
pip install voice-soundboard[web]         # + Mobile web UI
pip install voice-soundboard[all]         # Everything

Advanced topics

Voice cloning

F5-TTS can clone a voice from a short audio sample (3-10 seconds). The sample should be clean speech with minimal background noise. See the voice cloning example for a working script.

Multi-speaker dialogue

Generate conversations with multiple voices, each with their own emotion and style. Script a dialogue and assign per-character voice settings:

dialogue = [
    {"speaker": "af_bella", "emotion": "happy", "text": "Good morning!"},
    {"speaker": "bm_george", "emotion": "calm", "text": "Morning. Coffee?"},
    {"speaker": "af_bella", "emotion": "excited", "text": "Yes please!"},
]

See the multi-speaker example for the full implementation.

SSML support

Speech Synthesis Markup Language gives you fine-grained control over pauses, emphasis, and prosody. The SSML parser uses defusedxml for protection against XML-based attacks.

<speak>
  <s>Welcome to <emphasis level="strong">Original Voice Soundboard</emphasis>.</s>
  <break time="500ms"/>
  <s>Let's get started.</s>
</speak>

WebSocket server

For real-time bidirectional communication, install the WebSocket extra:

pip install voice-soundboard[websocket]

This enables streaming audio generation over WebSocket connections, suitable for interactive applications with low-latency requirements.