Skip to content

Voice Integration

sensor-humor pairs with mcp-voice-soundboard to deliver spoken comedy with mood-appropriate prosody.

Each mood maps to a Piper TTS voice and 4 prosody knobs:

KnobWhat it controlsEffect on comedy
length_scaleSpeech speed (>1 = slower)Dry/cynic drag; zoomer rushes
noise_scaleExpressiveness (0 = flat, 1 = animated)Dry is monotone; chaotic is erratic
noise_w_scalePhoneme timing variance (0 = metronomic, 1 = erratic)Controls whether words land at even intervals
volumeLoudnessCynic is quiet; zoomer is loud
MoodPiper Voicelengthnoisenoise_wvol
dryen_GB-alan-medium1.150.30.30.9
roasten_US-ryan-high0.950.6670.81.0
chaoticen_US-lessac-high0.880.80.91.1
cheekyen_GB-cori-high1.050.50.60.95
cynicen_GB-alan-medium1.250.20.20.8
zoomeren_US-lessac-high0.900.850.91.15
  1. Install Piper voice models:
Terminal window
# Download to your model directory
# Required: alan-medium, ryan-high, lessac-high, cori-high
  1. Start voice-soundboard with Piper:
Terminal window
VOICE_SOUNDBOARD_ENGINE=piper \
VOICE_SOUNDBOARD_PIPER_MODEL_DIR=/path/to/piper/models \
npm start
  1. After any sensor-humor tool call, speak the result:
voice_speak({ text: result.roast, mood: "roast" })

If VOICE_SOUNDBOARD_ENGINE is not set, the default Kokoro backend is used. Kokoro supports voice selection and speed control but not full prosody (noise_scale, noise_w_scale, volume are ignored). The comedy still works — just without the prosodic separation between moods.

Kokoro strips SSML and only respects speed + voice selection. Piper’s ONNX inference exposes all 4 prosody knobs natively via SynthesisConfig, giving real vocal differentiation:

  • Dry and cynic use the same voice (alan) but cynic is measurably slower, flatter, and quieter
  • Chaotic has high timing variance — words don’t land at even intervals, which reads as erratic
  • Roast has natural expressiveness that gives verdict labels a confident lift
  • Zoomer is fast and loud — streamer energy with brisk pacing