Open source

Voice Soundboard TTS for AI agents.

A text-to-speech engine that separates what is said from how it’s rendered. Compiler → Graph → Engine architecture with swappable backends.

Get started Read the Handbook

Install

pip install voice-soundboard

Speak

engine.speak('Hello world!')

Emotion

engine.speak('Great news!', emotion='excited')

Features

Everything you need for production TTS.

Compiler / Graph / Engine

Intent compiles to a ControlGraph. The engine renders it to audio. Features are free at runtime.

Swappable Backends

Kokoro (GPU), Piper (CPU), OpenAI, ElevenLabs, Azure — switch without changing code.

Emotions & Styles

Add emotion="happy" or style="warmly and cheerfully" — the compiler bakes it into the graph.

Streaming Synthesis

Sentence-level streaming for LLM output. compile_stream() yields graphs as text arrives.

CLI Included

voice-soundboard speak "Hello!" — with presets, voice selection, and speed control.

MCP Server Ready

Built-in MCP adapter so AI agents can synthesize speech through the standard tool protocol.

Usage

Install

# Core library
pip install voice-soundboard

# With Kokoro backend (GPU)
pip install voice-soundboard[kokoro]

# With Piper backend (CPU)
pip install voice-soundboard[piper]

Basic

from voice_soundboard import VoiceEngine

engine = VoiceEngine()
result = engine.speak('Hello world!')
print(f'Saved to: {result.audio_path}')

With voice & emotion

result = engine.speak(
    'Breaking news!',
    voice='bm_george',
    preset='announcer',
    emotion='excited'
)

Streaming (LLM output)

from voice_soundboard.compiler import compile_stream
from voice_soundboard.runtime import StreamingSynthesizer

for graph in compile_stream(llm_chunks()):
    for chunk in streamer.stream(graph):
        play(chunk)

Architecture

Compiler → Graph → Engine. Clean separation, zero-cost features.

Compiler

Transforms text + emotion + style into a pure-data ControlGraph. All feature logic lives here.

ControlGraph

Immutable data structure with TokenEvents, SpeakerRefs, and prosody. Versioned for compatibility.

Engine

Transforms graphs into PCM audio. Knows nothing about emotions or styles — only synthesis.

Backends

Choose the right backend for your use case.

Backend

Quality

Speed

Sample Rate

Install

Kokoro

Excellent

Fast (GPU)

24 kHz

pip install voice-soundboard[kokoro]

Piper

Great

Fast (CPU)

22 kHz

pip install voice-soundboard[piper]

OpenAI

Excellent

API latency

Varies

pip install voice-soundboard[openai]

ElevenLabs

Excellent

API latency

Varies

pip install voice-soundboard[elevenlabs]

Azure

Excellent

API latency

Varies

pip install voice-soundboard[azure]

Mock

N/A

Instant

24 kHz

Built-in (testing)