Skip to content

Prosody Controls

MCP Voice Engine provides event-driven prosody controls that shape cadence and intonation intentionally — without destabilizing pitch targets.

Accents and boundary tones are applied as discrete events, not global style knobs. This lets you shape specific words or phrases while keeping the rest of the utterance stable.

Key properties enforced by the test suite:

BehaviorWhat it means
Accent localityAccents affect only the target pitch region — no smear into adjacent syllables
Question vs statementRising boundary tone for questions, falling for statements — reliably distinct
Post-focus compressionFocus events have consequences — compressed pitch range after the focal word
Deterministic event orderingSame prosody event sequence always produces the same pitch curve
Style monotonicityExpressivity increases (expressive > neutral > flat) without increasing instability

The test suite goes beyond “does it run” to enforce communicative behavior. These are semantic guardrails that verify the engine produces musically and linguistically meaningful output.

Each meaning test checks a specific communicative property:

  • Accent locality — an accent on word 3 should not affect the pitch of word 1 or word 5
  • Question rise — a question boundary tone should produce measurably higher terminal pitch than a statement
  • Post-focus compression — words after a focus event should have a narrower pitch range
  • Event ordering — given the same event sequence, the output must be bit-identical
  • Style monotonicity — increasing expressivity should increase pitch variation without increasing jitter

The engine supports multiple style levels with a strict monotonicity guarantee:

flat < neutral < expressive

Higher expressivity means more pitch variation and wider dynamic range, but never more instability. This is verified by comparing jitter metrics across style levels.