Voice Presets
Vocal Synth Engine ships with 15 voice presets. Each preset is a frozen analysis artifact containing the spectral characteristics of a singing voice.
Preset catalog
Section titled “Preset catalog”| Preset | Voice | Timbres |
|---|---|---|
default-voice | Baseline female | Default timbre |
bright-lab | Lab/experimental | Bright formant |
kokoro-af-aoede | Aoede (female) | Multiple timbres |
kokoro-af-heart | Heart (female) | Multiple timbres |
kokoro-af-jessica | Jessica (female) | Multiple timbres |
kokoro-af-sky | Sky (female) | Multiple timbres |
kokoro-am-eric | Eric (male) | Multiple timbres |
kokoro-am-fenrir | Fenrir (male) | Multiple timbres |
kokoro-am-liam | Liam (male) | Multiple timbres |
kokoro-am-onyx | Onyx (male) | Multiple timbres |
kokoro-bf-alice | Alice (British female) | Multiple timbres |
kokoro-bf-emma | Emma (British female) | Multiple timbres |
kokoro-bf-isabella | Isabella (British female) | Multiple timbres |
kokoro-bm-george | George (British male) | Multiple timbres |
kokoro-bm-lewis | Lewis (British male) | Multiple timbres |
Binary timbre format
Section titled “Binary timbre format”Each timbre is stored as a set of .f32 binary files containing 32-bit floating-point arrays:
- Harmonic magnitudes — Amplitude values for each harmonic partial across the pitch range
- Spectral envelope — Formant shape curve used to weight harmonic amplitudes
- Noise floor — Broadband noise spectrum for breathiness and consonant texture
These files are loaded at startup and held in memory for zero-latency access during synthesis.
Manifest schema
Section titled “Manifest schema”Each preset directory contains a manifest.json file validated against a Zod schema. Top-level fields:
| Field | Type | Description |
|---|---|---|
schema | "mcp-voice-engine.voicepreset" | Fixed schema identifier |
version | string | Schema version |
id | string | Unique preset identifier (used in API calls) |
sampleRateHz | number | Sample rate the preset was analyzed at |
analysis | object | Analysis parameters: frameMs, hopMs, f0Method, maxHarmonics, envelope, noise |
timbres | array | List of timbres, each with name, kind, assets (paths to .f32 files), and defaults |
integrity | object | Optional hash fields: assetsHash, analysisHash |
Each timbre entry in the timbres array has this shape:
| Field | Type | Description |
|---|---|---|
name | string | Timbre identifier (e.g. "AH", "EE", "OO") |
kind | string | Timbre category |
assets.harmonicsMag | string | Relative path to harmonic magnitudes .f32 file |
assets.envelopeDb | string | Relative path to spectral envelope .f32 file |
assets.noiseDb | string | Relative path to noise floor .f32 file |
assets.freqHz | string | Relative path to frequency axis .f32 file |
defaults.hnrDb | number | Default harmonics-to-noise ratio in dB |
defaults.breathiness | number | Default breathiness value (0 to 1) |
defaults.vibrato | object | Default vibrato: rateHz, depthCents, onsetMs |
Inspecting presets
Section titled “Inspecting presets”Use the built-in CLI inspector to examine preset data:
npm run inspectThis prints a table of all loaded presets with their timbre counts, pitch ranges, and file sizes.
You can also query presets through the REST API:
curl http://localhost:4321/api/presetsThe response includes full metadata for every preset, including timbre names and parameter ranges.
Using presets in renders
Section titled “Using presets in renders”When rendering a score via the REST API, specify the preset path in the config object:
{ "score": { "bpm": 120, "notes": [...] }, "config": { "presetPath": "presets/kokoro-af-heart", "sampleRateHz": 48000, "blockSize": 2048, "maxPolyphony": 8, "rngSeed": 42, "defaultTimbre": "AH", "deterministic": "exact" }}The engine resolves the preset, loads its binary timbre data, and uses it for the entire render. Changing the preset path produces a different voice while keeping the same score and timing. In the cockpit UI, the preset dropdown handles this automatically.
Multi-timbre usage
Section titled “Multi-timbre usage”Presets with multiple timbres support real-time timbre morphing via the XY pad in live mode. The X axis interpolates between timbres, blending their spectral characteristics smoothly. In score mode, per-note timbre values select or blend between available timbres.