Skip to content

Voice Presets

Vocal Synth Engine ships with 15 voice presets. Each preset is a frozen analysis artifact containing the spectral characteristics of a singing voice.

PresetVoiceTimbres
default-voiceBaseline femaleDefault timbre
bright-labLab/experimentalBright formant
kokoro-af-aoedeAoede (female)Multiple timbres
kokoro-af-heartHeart (female)Multiple timbres
kokoro-af-jessicaJessica (female)Multiple timbres
kokoro-af-skySky (female)Multiple timbres
kokoro-am-ericEric (male)Multiple timbres
kokoro-am-fenrirFenrir (male)Multiple timbres
kokoro-am-liamLiam (male)Multiple timbres
kokoro-am-onyxOnyx (male)Multiple timbres
kokoro-bf-aliceAlice (British female)Multiple timbres
kokoro-bf-emmaEmma (British female)Multiple timbres
kokoro-bf-isabellaIsabella (British female)Multiple timbres
kokoro-bm-georgeGeorge (British male)Multiple timbres
kokoro-bm-lewisLewis (British male)Multiple timbres

Each timbre is stored as a set of .f32 binary files containing 32-bit floating-point arrays:

  • Harmonic magnitudes — Amplitude values for each harmonic partial across the pitch range
  • Spectral envelope — Formant shape curve used to weight harmonic amplitudes
  • Noise floor — Broadband noise spectrum for breathiness and consonant texture

These files are loaded at startup and held in memory for zero-latency access during synthesis.

Each preset directory contains a JSON manifest file describing the voice characteristics:

FieldTypeDescription
namestringHuman-readable preset name
idstringUnique preset identifier (used in API calls)
pitch_range[min, max]Supported pitch range in MIDI note numbers
resonanceobjectFormant resonance parameters
vibrato_defaultsobjectDefault vibrato rate, depth, and onset delay
timbresarrayList of available timbres with their binary asset paths

Use the built-in CLI inspector to examine preset data:

Terminal window
npm run inspect

This prints a table of all loaded presets with their timbre counts, pitch ranges, and file sizes.

You can also query presets through the REST API:

Terminal window
curl http://localhost:4321/api/presets

The response includes full metadata for every preset, including timbre names and parameter ranges.

When rendering a score (via the cockpit UI or the REST API), specify the preset by its id:

{
"preset": "kokoro-af-heart",
"score": { ... },
"polyphony": 8,
"seed": 42
}

The engine resolves the preset, loads its binary timbre data, and uses it for the entire render. Changing the preset produces a different voice while keeping the same score and timing.

Presets with multiple timbres support real-time timbre morphing via the XY pad in live mode. The X axis interpolates between timbres, blending their spectral characteristics smoothly. In score mode, per-note timbre values select or blend between available timbres.