MCP Server

Original Voice Soundboard includes an MCP (Model Context Protocol) server with 40+ tools. This lets AI agents like Claude generate speech, manage voices, control emotions, and more.

Install the MCP extra

The MCP server requires an additional dependency:

pip install voice-soundboard[mcp]

Configure Claude Desktop

Add the following to your Claude Desktop MCP configuration file:

{
  "mcpServers": {
    "voice-soundboard": {
      "command": "python",
      "args": ["-m", "voice_soundboard.server"]
    }
  }
}

On macOS and Linux, this file is typically at:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

Using it with Claude

Once configured, just ask Claude to speak. For example:

“Say hello in an excited voice”
“Read this paragraph aloud using the narrator preset”
“Generate speech for this text with a sad emotion”
“List all available voices”
“Speak this dialogue with different characters”

Claude will use the MCP tools to generate audio files and play them.

What the 40+ tools cover

The MCP server exposes tools for:

Speech generation — speak text with any voice, emotion, or style
Voice management — list voices, get voice details, search by accent or gender
Emotion control — set emotions, list available emotions, blend emotions
Preset management — use and list voice presets
Engine management — switch between Kokoro, Chatterbox, and F5-TTS
Audio output — control output format, sample rate, file paths
Multi-speaker — generate dialogue with per-character voices
SSML — send Speech Synthesis Markup Language for fine-grained control

Running the server standalone

You can also run the MCP server directly for use with other MCP-compatible clients:

python -m voice_soundboard.server

The server communicates over stdio using the JSON-RPC 2.0 protocol, which is the standard MCP transport.

Security notes

The MCP server runs locally on your machine. It does not make network calls, send telemetry, or store data beyond transient audio buffers. All processing happens on your hardware. See the SECURITY.md in the repository for the full security model.