Skip to content

API Reference

Vocal Synth Engine exposes a REST API and two WebSocket endpoints. All endpoints are served from the same Express server.

Authentication is optional. When the AUTH_TOKEN environment variable is set, protected endpoints require a bearer token:

Authorization: Bearer <your-token>

Endpoints marked “Auth: Yes” in the table below are protected when AUTH_TOKEN is configured. When unset, all endpoints are open.

Path/api/health
MethodGET
AuthNo
DescriptionServer health, version string, and uptime in seconds.
Path/api/presets
MethodGET
AuthNo
DescriptionReturns all voice presets with timbres, pitch ranges, and metadata.
Path/api/phonemize
MethodPOST
AuthYes
DescriptionConvert lyrics text to a sequence of phoneme events.

Request body:

{
"text": "hello world"
}
Path/api/render
MethodPOST
AuthYes
DescriptionRender a VocalScore to WAV. Returns a render ID for retrieving the result.

Request body:

{
"preset": "kokoro-af-heart",
"score": { ... },
"polyphony": 8,
"seed": 42,
"bpm": 120
}
Path/api/renders
MethodGET
AuthYes
DescriptionList all saved renders with metadata.
Path/api/renders/:id/audio.wav
MethodGET
AuthYes
DescriptionDownload the rendered WAV file.
Path/api/renders/:id/score
MethodGET
AuthYes
DescriptionRetrieve the original score JSON used for this render.
Path/api/renders/:id/meta
MethodGET
AuthYes
DescriptionRender metadata including preset, polyphony, seed, and timing.
Path/api/renders/:id/telemetry
MethodGET
AuthYes
DescriptionPerformance telemetry: peak dBFS, real-time factor, click count.
Path/api/renders/:id/provenance
MethodGET
AuthYes
DescriptionProvenance data: commit SHA, score hash, WAV hash, engine config.
Path/ws
PurposeSingle-user note playback with real-time audio streaming.

The live WebSocket accepts note-on and note-off messages and streams PCM audio blocks back to the client. The cockpit UI’s Live tab uses this endpoint.

Path/ws/jam
PurposeMulti-user collaborative sessions with recording.

The jam WebSocket uses a structured JSON protocol. See the Cockpit and Jams page for the full protocol table and session lifecycle.

All API errors return a JSON object:

{
"code": "RENDER_FAILED",
"message": "Polyphony limit exceeded",
"hint": "Reduce the number of simultaneous notes or increase the polyphony setting"
}
FieldTypeDescription
codestringMachine-readable error code
messagestringHuman-readable description
hintstringSuggested fix or next step

HTTP status codes follow standard conventions: 400 for bad requests, 401 for missing/invalid auth, 404 for not found, 500 for server errors.