Skip to content

Voice Presets

Vocal Synth Engine ships with 15 voice presets. Each preset is a frozen analysis artifact containing the spectral characteristics of a singing voice.

PresetVoiceTimbres
default-voiceBaseline femaleDefault timbre
bright-labLab/experimentalBright formant
kokoro-af-aoedeAoede (female)Multiple timbres
kokoro-af-heartHeart (female)Multiple timbres
kokoro-af-jessicaJessica (female)Multiple timbres
kokoro-af-skySky (female)Multiple timbres
kokoro-am-ericEric (male)Multiple timbres
kokoro-am-fenrirFenrir (male)Multiple timbres
kokoro-am-liamLiam (male)Multiple timbres
kokoro-am-onyxOnyx (male)Multiple timbres
kokoro-bf-aliceAlice (British female)Multiple timbres
kokoro-bf-emmaEmma (British female)Multiple timbres
kokoro-bf-isabellaIsabella (British female)Multiple timbres
kokoro-bm-georgeGeorge (British male)Multiple timbres
kokoro-bm-lewisLewis (British male)Multiple timbres

Each timbre is stored as a set of .f32 binary files containing 32-bit floating-point arrays:

  • Harmonic magnitudes — Amplitude values for each harmonic partial across the pitch range
  • Spectral envelope — Formant shape curve used to weight harmonic amplitudes
  • Noise floor — Broadband noise spectrum for breathiness and consonant texture

These files are loaded at startup and held in memory for zero-latency access during synthesis.

Each preset directory contains a manifest.json file validated against a Zod schema. Top-level fields:

FieldTypeDescription
schema"mcp-voice-engine.voicepreset"Fixed schema identifier
versionstringSchema version
idstringUnique preset identifier (used in API calls)
sampleRateHznumberSample rate the preset was analyzed at
analysisobjectAnalysis parameters: frameMs, hopMs, f0Method, maxHarmonics, envelope, noise
timbresarrayList of timbres, each with name, kind, assets (paths to .f32 files), and defaults
integrityobjectOptional hash fields: assetsHash, analysisHash

Each timbre entry in the timbres array has this shape:

FieldTypeDescription
namestringTimbre identifier (e.g. "AH", "EE", "OO")
kindstringTimbre category
assets.harmonicsMagstringRelative path to harmonic magnitudes .f32 file
assets.envelopeDbstringRelative path to spectral envelope .f32 file
assets.noiseDbstringRelative path to noise floor .f32 file
assets.freqHzstringRelative path to frequency axis .f32 file
defaults.hnrDbnumberDefault harmonics-to-noise ratio in dB
defaults.breathinessnumberDefault breathiness value (0 to 1)
defaults.vibratoobjectDefault vibrato: rateHz, depthCents, onsetMs

Use the built-in CLI inspector to examine preset data:

Terminal window
npm run inspect

This prints a table of all loaded presets with their timbre counts, pitch ranges, and file sizes.

You can also query presets through the REST API:

Terminal window
curl http://localhost:4321/api/presets

The response includes full metadata for every preset, including timbre names and parameter ranges.

When rendering a score via the REST API, specify the preset path in the config object:

{
"score": { "bpm": 120, "notes": [...] },
"config": {
"presetPath": "presets/kokoro-af-heart",
"sampleRateHz": 48000,
"blockSize": 2048,
"maxPolyphony": 8,
"rngSeed": 42,
"defaultTimbre": "AH",
"deterministic": "exact"
}
}

The engine resolves the preset, loads its binary timbre data, and uses it for the entire render. Changing the preset path produces a different voice while keeping the same score and timing. In the cockpit UI, the preset dropdown handles this automatically.

Presets with multiple timbres support real-time timbre morphing via the XY pad in live mode. The X axis interpolates between timbres, blending their spectral characteristics smoothly. In score mode, per-note timbre values select or blend between available timbres.