Score Format

A VocalScore is the canonical input to the render engine. It is plain JSON, validated at every trust boundary by src/types/scoreSchema.ts (Zod). The same shape is accepted by the REST API (POST /api/render), by the CLI tools (play-score, compare), and by the cockpit’s piano roll.

Top-level shape

{
  "formatVersion": "1.0.0",
  "bpm": 120,
  "notes": [ ... ],
  "lyrics": { ... },
  "phonemes": [ ... ],
  "lanes": { ... }
}

Field	Type	Required	Notes
`formatVersion`	semver string	optional	Defaults to `"1.0.0"`. The engine rejects scores whose version is not in `SUPPORTED_SCORE_VERSIONS`.
`bpm`	finite number > 0	required	Beats per minute. Used by quantization helpers and by phonemizer alignment; the engine does not warp time.
`notes`	array of `VocalNote`	required	The pitched voices to render. Empty array is legal (renders silence + tail).
`lyrics`	`{text, language?}`	optional	Source text for the phonemize endpoint. Stored but not synthesized directly; phonemes drive timbre.
`phonemes`	array of events	optional	Pre-computed phoneme timeline. When absent the engine uses the `timbre` field on each note.
`lanes`	`LanesObject`	optional	Time-varying automation curves applied across the score (`dynamics`, `breathiness`, `timbreMorph`).

`VocalNote`

{
  "id": "n1",
  "startSec": 0.0,
  "durationSec": 0.5,
  "midi": 60,
  "velocity": 0.8,
  "timbre": "ah",
  "vibrato": { "rateHz": 5.5, "depthCents": 50, "onsetSec": 0.2 },
  "portamentoSec": 0.05,
  "pan": 0.0
}

Field	Type	Required	Constraints
`id`	non-empty string	required	Stable identifier. Used by the cockpit to track edits.
`startSec`	finite number ≥ 0	required	Seconds from start of score.
`durationSec`	finite number > 0	required	Length of the note.
`midi`	finite number, 0..127	required	MIDI note number (60 = middle C).
`velocity`	finite number, 0..1	optional	Default 0.8.
`timbre`	non-empty string	optional	Preset-specific timbre id (e.g. `"ah"`, `"oo"`, `"ee"`).
`vibrato`	object	optional	`rateHz`, `depthCents`, `onsetSec` — all finite, ≥ 0.
`portamentoSec`	finite number ≥ 0	optional	Pitch-glide duration into this note.
`pan`	finite number, -1..1	optional	Stereo pan when rendering to 2 channels. Ignored on mono renders.

`lyrics`

{ "text": "hello world", "language": "en" }

Field	Type	Required	Notes
`text`	string	required	Raw lyric text.
`language`	string	optional	ISO language tag. Used by the G2P backend; default `"en"`.

`phonemes`

Each entry is a phoneme event aligned in time. Generated by POST /api/phonemize and persisted into the score so re-rendering is reproducible.

{
  "tSec": 0.0,
  "durSec": 0.18,
  "phoneme": "AH",
  "kind": "vowel",
  "timbreHint": "ah",
  "strength": 0.85
}

Field	Type	Required	Notes
`tSec`	finite number ≥ 0	required	Start time of this phoneme.
`durSec`	finite number > 0	required	Phoneme duration.
`phoneme`	non-empty string	required	Phoneme label (engine vocabulary).
`kind`	`"vowel"` \| `"consonant"`	required	Routing hint.
`timbreHint`	non-empty string	optional	Preferred timbre for vowels; engine may override.
`strength`	finite number, 0..1	optional	Vowel strength for consonant-to-vowel transitions.

`lanes` — automation

Three automation lanes layer over the rendered audio:

{
  "dynamics":   [ { "tSec": 0.0, "value": 0.6 }, { "tSec": 1.2, "value": 1.0 } ],
  "breathiness":[ { "tSec": 0.0, "value": 0.0 }, { "tSec": 0.5, "value": 0.4 } ],
  "timbreMorph": {
    "ah": [ { "tSec": 0.0, "value": 1.0 }, { "tSec": 0.5, "value": 0.0 } ],
    "oo": [ { "tSec": 0.0, "value": 0.0 }, { "tSec": 0.5, "value": 1.0 } ]
  }
}

Lane	Value range	Effect
`dynamics`	any finite number	Multiplier on output gain — linear, not dB.
`breathiness`	0..1	Mix-in noise residual amplitude. 0 = pure tonal, 1 = breathy.
`timbreMorph`	per-timbre 0..1	Cross-fade weights across the preset’s timbres. Weights are normalised internally before mixing.

Each lane is a sorted array of { tSec, value } breakpoints; the engine linearly interpolates between them.

Complete example

A two-bar score with lyrics, phonemes, and a breathiness automation lane:

{
  "formatVersion": "1.0.0",
  "bpm": 120,
  "notes": [
    { "id": "n1", "startSec": 0.0, "durationSec": 0.5, "midi": 60, "velocity": 0.8, "timbre": "ah" },
    { "id": "n2", "startSec": 0.5, "durationSec": 0.5, "midi": 64, "velocity": 0.8, "timbre": "ee" },
    { "id": "n3", "startSec": 1.0, "durationSec": 1.0, "midi": 67, "velocity": 0.9, "timbre": "oo",
      "vibrato": { "rateHz": 5.5, "depthCents": 50, "onsetSec": 0.2 }, "portamentoSec": 0.04 }
  ],
  "lyrics": { "text": "la la la", "language": "en" },
  "phonemes": [
    { "tSec": 0.00, "durSec": 0.05, "phoneme": "L", "kind": "consonant", "strength": 0.7 },
    { "tSec": 0.05, "durSec": 0.45, "phoneme": "AH", "kind": "vowel", "timbreHint": "ah" },
    { "tSec": 0.55, "durSec": 0.45, "phoneme": "AH", "kind": "vowel", "timbreHint": "ee" },
    { "tSec": 1.05, "durSec": 0.95, "phoneme": "AH", "kind": "vowel", "timbreHint": "oo" }
  ],
  "lanes": {
    "breathiness": [ { "tSec": 0.0, "value": 0.0 }, { "tSec": 1.5, "value": 0.3 } ]
  }
}

Save this file as score.json and render it from the CLI:

npm run play-score -- --score score.json --preset kokoro-am-michael --out song.wav

…or POST it to /api/render:

curl -X POST http://localhost:3000/api/render \
  -H 'Content-Type: application/json' \
  -d @<(jq -n --slurpfile s score.json '{score: $s[0], config: {presetId: "kokoro-am-michael", maxPolyphony: 4, deterministic: "exact", rngSeed: 123}}')

Versioning and migrations

formatVersion is gated against SUPPORTED_SCORE_VERSIONS (see src/types/scoreSchema.ts). A score from a future version fails loud with UNSUPPORTED_SCORE_VERSION rather than silently dropping unknown fields. When the schema gains new required fields, the version is bumped and a migration note is added to the CHANGELOG.

Score Format

Top-level shape

VocalNote

lyrics

phonemes

lanes — automation