Getting Started

Get from zero to generated speech in about 60 seconds.

Requirements

Python 3.10 or newer
Approximately 350 MB of disk space for Kokoro models
Optional: CUDA-compatible GPU for faster generation

Install

pip install voice-soundboard

This installs the core library with the Kokoro TTS engine. For optional engines and extras, see the Engines page.

Download models

Kokoro needs two model files. Create a models/ directory and download them:

mkdir models && cd models
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin

Generate your first speech

from voice_soundboard import VoiceEngine

engine = VoiceEngine()
result = engine.speak("Hello world!")
print(result.audio_path)  # output/af_bella_<hash>.wav

That produces a .wav file you can play with any audio player.

CLI usage

If you prefer the command line:

voice-soundboard speak "Hello world!"
# -> output/hello_world.wav

More CLI examples

# Pick a specific voice
voice-soundboard speak "Cheerio!" --voice bm_george

# Use a preset
voice-soundboard speak "Breaking news!" --preset announcer

# Set an emotion
voice-soundboard speak "I'm so happy!" --emotion excited

What happens under the hood

Text goes into the Kokoro ONNX model running locally on your machine
The model generates raw audio samples
Voice Soundboard writes a .wav file to the output/ directory
You get back a result object with the file path and metadata

No network calls. No API keys. No cloud. Everything runs on your hardware.

Next steps

Explore the 54+ voices and 19 emotions
Set up the MCP server so AI agents can speak
Learn about additional engines for voice cloning and paralinguistic tags