Skip to content

Backends

Voice Soundboard ships with multiple TTS backends. Each backend implements the same interface, so you can switch between them without changing application code.

BackendQualitySpeedSample RateInstall
KokoroExcellentFast (GPU)24000 Hzpip install voice-soundboard[kokoro]
PiperGreatFast (CPU)22050 Hzpip install voice-soundboard[piper]
OpenAIExcellentCloud24000 Hzpip install voice-soundboard[openai]
CoquiGreatModerate (GPU)22050 Hzpip install voice-soundboard[coqui]
ElevenLabsPremiumCloud44100 Hzpip install voice-soundboard[elevenlabs]
AzureExcellentCloud24000 Hzpip install voice-soundboard[azure]
MockN/AInstant24000 HzBuilt-in (testing)

Kokoro is the recommended backend for production use when a GPU is available. It produces excellent quality audio at 24 kHz sample rate.

Terminal window
pip install voice-soundboard[kokoro]
# Download models
mkdir models && cd models
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
from voice_soundboard import VoiceEngine, Config
engine = VoiceEngine(Config(backend="kokoro"))
result = engine.speak("Hello from Kokoro!", voice="af_bella")

Piper is ideal when no GPU is available. It runs entirely on CPU and supports 30+ voices across multiple languages (English, German, French, Spanish).

Terminal window
pip install voice-soundboard[piper]
# Download a voice (example: en_US_lessac_medium)
python -m piper.download_voices en_US-lessac-medium
  • 30+ voices across multiple languages
  • Pure CPU — no GPU required
  • Speed control via length_scale (inverted: 0.8 = faster, 1.2 = slower)
  • Sample rate: 22050 Hz (backend-specific)
from voice_soundboard import VoiceEngine, Config
engine = VoiceEngine(Config(backend="piper"))
result = engine.speak("Hello from Piper!")

Kokoro voice names are automatically mapped to Piper equivalents when using the Piper backend. This means you can write code with Kokoro voice names and it will work on CPU machines with Piper installed.

engine = VoiceEngine(Config(backend="piper"))
result = engine.speak("Hello!", voice="af_bella") # Uses en_US_lessac_medium

Uses the OpenAI TTS API. Requires an API key and internet connectivity.

Terminal window
pip install voice-soundboard[openai]
from voice_soundboard import VoiceEngine, Config
engine = VoiceEngine(Config(backend="openai"))
result = engine.speak("Hello from OpenAI TTS!")

Set your API key via the OPENAI_API_KEY environment variable.

Coqui TTS provides open-source models with GPU acceleration.

Terminal window
pip install voice-soundboard[coqui]
engine = VoiceEngine(Config(backend="coqui"))
result = engine.speak("Hello from Coqui!")

Premium cloud voices with high fidelity at 44100 Hz sample rate. Requires an API key.

Terminal window
pip install voice-soundboard[elevenlabs]
engine = VoiceEngine(Config(backend="elevenlabs"))
result = engine.speak("Hello from ElevenLabs!")

Set your API key via the ELEVENLABS_API_KEY environment variable.

Microsoft Azure Cognitive Services Speech. Requires an Azure subscription.

Terminal window
pip install voice-soundboard[azure]
engine = VoiceEngine(Config(backend="azure"))
result = engine.speak("Hello from Azure!")

Configure via AZURE_SPEECH_KEY and AZURE_SPEECH_REGION environment variables.

The mock backend produces instant silence. It is built-in and requires no installation. Use it for testing and CI pipelines.

from voice_soundboard import VoiceEngine, Config
engine = VoiceEngine(Config(backend="mock"))
result = engine.speak("This produces silence instantly.")