Skip to content

Getting Started

Get from zero to generated speech in about 60 seconds.

  • Python 3.10 or newer
  • Approximately 350 MB of disk space for Kokoro models
  • Optional: CUDA-compatible GPU for faster generation
Terminal window
pip install voice-soundboard

This installs the core library with the Kokoro TTS engine. For optional engines and extras, see the Engines page.

Kokoro needs two model files. Create a models/ directory and download them:

Terminal window
mkdir models && cd models
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx
curl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin
from voice_soundboard import VoiceEngine
engine = VoiceEngine()
result = engine.speak("Hello world!")
print(result.audio_path) # output/af_bella_<hash>.wav

That produces a .wav file you can play with any audio player.

If you prefer the command line:

Terminal window
voice-soundboard speak "Hello world!"
# -> output/hello_world.wav
Terminal window
# Pick a specific voice
voice-soundboard speak "Cheerio!" --voice bm_george
# Use a preset
voice-soundboard speak "Breaking news!" --preset announcer
# Set an emotion
voice-soundboard speak "I'm so happy!" --emotion excited
  1. Text goes into the Kokoro ONNX model running locally on your machine
  2. The model generates raw audio samples
  3. Voice Soundboard writes a .wav file to the output/ directory
  4. You get back a result object with the file path and metadata

No network calls. No API keys. No cloud. Everything runs on your hardware.