CLI Reference
Vocal Synth Engine ships nine command-line tools in src/cli/. All of them honour --json (machine-readable single-object output to stdout) and -h / --help. Run any of them with --help for the full usage banner. Two new tools (vse-score-from-midi, vse-phonemize) are planned and noted at the end of this page.
| Tool | Entry | One-line purpose |
|---|---|---|
analyze | src/cli/analyze.ts | Build a single-timbre voice preset from one calibration WAV. |
build-preset | src/cli/build-preset.ts | Build a multi-timbre voice preset from N calibration WAVs. |
compare | src/cli/compare.ts | A/B compare two WAVs by peak / RMS / max abs delta. |
gen-vowel-wav | src/cli/gen-vowel-wav.ts | Generate AH/EE/OO calibration WAVs (A3=220Hz, 48kHz mono, peak -3dB). |
inspect | src/cli/inspect.ts | Print a voice preset’s manifest + timbre list. |
play-score | src/cli/play-score.ts | Render a VocalScore JSON to WAV using a preset. |
realtime-demo | src/cli/realtime-demo.ts | Architecture demo — worker_threads + streaming engine + telemetry. |
resynth | src/cli/resynth.ts | Standalone single-tone renderer for spot-checking a timbre. |
Invoke via npx tsx src/cli/<tool>.ts ... or via the npm scripts (only inspect is wired in package.json today; add others on demand).
analyze
Section titled “analyze”npx tsx src/cli/analyze.ts <input.wav> <out-dir> <timbre-name> [--json]Analyses a single calibration WAV (48kHz mono) and writes a one-timbre preset to <out-dir>:
<out-dir>/voicepreset.json<out-dir>/assets/<timbre>_harmonics_mag.f32<out-dir>/assets/<timbre>_envelope_db.f32<out-dir>/assets/<timbre>_noise_db.f32<out-dir>/assets/freq_axis_hz.f32Example
npx tsx src/cli/analyze.ts calib/AH.wav presets/my-voice AHJSON output (--json) — { ok: true, presetDir, manifestPath, timbres: [name] } or { ok: false, code, message }.
build-preset
Section titled “build-preset”npx tsx src/cli/build-preset.ts --out <preset-dir> <wav:timbre> [<wav:timbre> ...] [--json]Same shape as analyze but accepts multiple <wav>:<timbre> positional pairs to build a multi-timbre preset in a single pass. Splits on the LAST colon, so Windows drive letters (F:/calib/AH.wav:AH) work.
Example
npx tsx src/cli/build-preset.ts --out presets/my-voice \ calib/AH.wav:AH calib/EE.wav:EE calib/OO.wav:OOcompare
Section titled “compare”npx tsx src/cli/compare.ts <ref.wav> <test.wav> [--json]Compares two WAVs sample-by-sample. Returns peak dBFS for each file, RMS, max absolute delta, and a verdict (identical / close / different). Used in regression tests to detect synth drift.
Example
npx tsx src/cli/compare.ts golden/song.wav out.wav --jsonJSON output — { ok, peakA, peakB, rmsA, rmsB, maxAbsDelta, verdict }.
gen-vowel-wav
Section titled “gen-vowel-wav”npx tsx src/cli/gen-vowel-wav.ts [<out-dir>]Generates AH.wav, EE.wav, OO.wav calibration files (3 s each, 48 kHz mono, A3 = 220 Hz, peak −3 dBFS, 10 ms fade in/out). Defaults to writing into calib/.
Example
npx tsx src/cli/gen-vowel-wav.ts calib/my-voiceinspect
Section titled “inspect”npx tsx src/cli/inspect.ts <path-to-voicepreset.json> [--json]Prints the manifest header of a voice preset (name, formatVersion, default timbre, pitch range, list of timbres). Useful for verifying a freshly-built preset.
npm script
npm run inspect -- presets/default-voice/voicepreset.jsonJSON output — { ok, manifest: { id, name, formatVersion, defaultTimbre, pitchRangeMidi, timbres: [...] } }.
play-score
Section titled “play-score”npx tsx src/cli/play-score.ts <preset.json> <score.json> <out.wav> [--json]The canonical “offline render” tool. Loads a preset + a VocalScore (see Score Format) and writes a WAV. Deterministic given the same inputs + seed.
Example
npx tsx src/cli/play-score.ts \ presets/kokoro-am-michael/voicepreset.json \ examples/lullaby.json \ out/lullaby.wavJSON output — { ok, wavPath, durationSec, telemetry: { peakDbfs, voicesMax, rtf, ... }, provenance: { ... } }.
realtime-demo
Section titled “realtime-demo”npx tsx src/cli/realtime-demo.ts <preset.json> <score.json>Architecture demo only. Spawns a worker_threads audio thread, renders blocks (blockSize=512) with deterministic="fast" + Date.now() seed, and prints per-10-block RTF telemetry. After 1 s the control thread posts an update_score message to demonstrate cross-thread live edits. No real audio device — for deterministic offline renders use play-score.
resynth
Section titled “resynth”npx tsx src/cli/resynth.ts <preset.json> <out.wav> <f0> <duration_sec> [timbre]Standalone single-tone renderer. Plays one steady additive-synthesis tone at <f0> Hz for <duration_sec> seconds using the named timbre (or the preset’s default). Bypasses the streaming engine — useful for spot-checking a single timbre after build-preset.
Example
npx tsx src/cli/resynth.ts presets/default-voice/voicepreset.json out.wav 220 2.0 AHCommon conventions
Section titled “Common conventions”--json— every CLI emits exactly one line of JSON to stdout. Errors emit{ ok: false, code, message, hint? }. Use this in scripts; do not screen-scrape the human banner.-h, --help— full usage block to stderr; exit code 0.- Exit code — 0 on success, non-zero on any error. The non-zero code is the same value across all CLIs so shells can rely on
if ! npx tsx … ; thenchains.
Planned
Section titled “Planned”The following CLIs are scoped for the next release:
vse-score-from-midi— convert a.midfile to a VocalScore JSON.vse-phonemize— wrap the/api/phonemizeG2P pipeline in a CLI for offline workflows.
Both will follow the conventions above (--json, -h, --help, deterministic exit codes). Watch the CHANGELOG.