VS Voice Soundboard
Open source

Voice Soundboard TTS for AI agents.

A text-to-speech engine that separates what is said from how it’s rendered. Compiler → Graph → Engine architecture with swappable backends.

Install

pip install voice-soundboard

Speak

engine.speak('Hello world!')

Emotion

engine.speak('Great news!', emotion='excited')

Features

Everything you need for production TTS.

Compiler / Graph / Engine

Intent compiles to a ControlGraph. The engine renders it to audio. Features are free at runtime.

Swappable Backends

Kokoro (GPU), Piper (CPU), OpenAI, ElevenLabs, Azure — switch without changing code.

Emotions & Styles

Add emotion="happy" or style="warmly and cheerfully" — the compiler bakes it into the graph.

Streaming Synthesis

Sentence-level streaming for LLM output. compile_stream() yields graphs as text arrives.

CLI Included

voice-soundboard speak "Hello!" — with presets, voice selection, and speed control.

MCP Server Ready

Built-in MCP adapter so AI agents can synthesize speech through the standard tool protocol.

Usage

Install

# Core library
pip install voice-soundboard

# With Kokoro backend (GPU)
pip install voice-soundboard[kokoro]

# With Piper backend (CPU)
pip install voice-soundboard[piper]

Basic

from voice_soundboard import VoiceEngine

engine = VoiceEngine()
result = engine.speak('Hello world!')
print(f'Saved to: {result.audio_path}')

With voice & emotion

result = engine.speak(
    'Breaking news!',
    voice='bm_george',
    preset='announcer',
    emotion='excited'
)

Streaming (LLM output)

from voice_soundboard.compiler import compile_stream
from voice_soundboard.runtime import StreamingSynthesizer

for graph in compile_stream(llm_chunks()):
    for chunk in streamer.stream(graph):
        play(chunk)

Architecture

Compiler → Graph → Engine. Clean separation, zero-cost features.

Compiler

Transforms text + emotion + style into a pure-data ControlGraph. All feature logic lives here.

ControlGraph

Immutable data structure with TokenEvents, SpeakerRefs, and prosody. Versioned for compatibility.

Engine

Transforms graphs into PCM audio. Knows nothing about emotions or styles — only synthesis.

Backends

Choose the right backend for your use case.

Backend
Quality
Speed
Sample Rate
Install
Kokoro
Excellent
Fast (GPU)
24 kHz
pip install voice-soundboard[kokoro]
Piper
Great
Fast (CPU)
22 kHz
pip install voice-soundboard[piper]
OpenAI
Excellent
API latency
Varies
pip install voice-soundboard[openai]
ElevenLabs
Excellent
API latency
Varies
pip install voice-soundboard[elevenlabs]
Azure
Excellent
API latency
Varies
pip install voice-soundboard[azure]
Mock
N/A
Instant
24 kHz
Built-in (testing)