Getting Started
Get from zero to generated speech in about 60 seconds.
Requirements
Section titled “Requirements”- Python 3.10 or newer
- Approximately 350 MB of disk space for Kokoro models
- Optional: CUDA-compatible GPU for faster generation
Install
Section titled “Install”pip install voice-soundboardThis installs the core library with the Kokoro TTS engine. For optional engines and extras, see the Engines page.
Download models
Section titled “Download models”Kokoro needs two model files. Create a models/ directory and download them:
mkdir models && cd modelscurl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnxcurl -LO https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.binGenerate your first speech
Section titled “Generate your first speech”from voice_soundboard import VoiceEngine
engine = VoiceEngine()result = engine.speak("Hello world!")print(result.audio_path) # output/af_bella_<hash>.wavThat produces a .wav file you can play with any audio player.
CLI usage
Section titled “CLI usage”If you prefer the command line:
voice-soundboard speak "Hello world!"# -> output/hello_world.wavMore CLI examples
Section titled “More CLI examples”# Pick a specific voicevoice-soundboard speak "Cheerio!" --voice bm_george
# Use a presetvoice-soundboard speak "Breaking news!" --preset announcer
# Set an emotionvoice-soundboard speak "I'm so happy!" --emotion excitedWhat happens under the hood
Section titled “What happens under the hood”- Text goes into the Kokoro ONNX model running locally on your machine
- The model generates raw audio samples
- Voice Soundboard writes a
.wavfile to theoutput/directory - You get back a result object with the file path and metadata
No network calls. No API keys. No cloud. Everything runs on your hardware.
Next steps
Section titled “Next steps”- Explore the 54+ voices and 19 emotions
- Set up the MCP server so AI agents can speak
- Learn about additional engines for voice cloning and paralinguistic tags