Skip to content

API Reference

Vocal Synth Engine exposes a REST API and two WebSocket endpoints. All endpoints are served from the same Express server.

Authentication is optional. When the AUTH_TOKEN environment variable is set, protected endpoints require a bearer token:

Authorization: Bearer <your-token>

Endpoints marked “Auth: Yes” in the table below are protected when AUTH_TOKEN is configured. When unset, all endpoints are open.

Tokens can be supplied two ways:

  • Header: Authorization: Bearer <your-token>
  • Query parameter: ?token=<your-token> (useful for <audio> src URLs where headers cannot be set)

The render endpoint (/api/render) enforces a per-IP rate limit. Defaults to 20 requests per minute. Override with the RATE_LIMIT_RPM environment variable. Exceeding the limit returns HTTP 429.

Path/api/health
MethodGET
AuthNo
DescriptionServer health, version string, and uptime in seconds.
Path/api/presets
MethodGET
AuthNo
DescriptionReturns all voice presets with timbres, pitch ranges, and metadata.
Path/api/phonemize
MethodPOST
AuthYes
DescriptionConvert lyrics text to a sequence of phoneme events.

Request body:

{
"text": "hello world"
}
Path/api/render
MethodPOST
AuthYes
DescriptionRender a VocalScore to WAV. Returns a URL for retrieving the audio.

Request body:

{
"score": {
"bpm": 120,
"notes": [
{ "id": "n1", "startSec": 0, "durationSec": 1, "midi": 60, "velocity": 0.8 }
]
},
"config": {
"presetPath": "presets/kokoro-af-heart",
"sampleRateHz": 48000,
"blockSize": 2048,
"maxPolyphony": 8,
"rngSeed": 42,
"defaultTimbre": "AH",
"deterministic": "exact"
}
}

Response:

{
"ok": true,
"durationSec": 1.1,
"telemetry": { ... },
"provenance": { ... },
"audioUrl": "/api/renders/last/audio.wav"
}

Score duration is capped at 60 seconds by default (override with MAX_RENDER_DURATION_SEC environment variable).

Path/api/renders
MethodGET
AuthYes
DescriptionList all saved renders with metadata.
Path/api/renders/:id/audio.wav
MethodGET
AuthYes
DescriptionDownload the rendered WAV file.
Path/api/renders/:id/score
MethodGET
AuthYes
DescriptionRetrieve the original score JSON used for this render.
Path/api/renders/:id/meta
MethodGET
AuthYes
DescriptionRender metadata including preset, polyphony, seed, and timing.
Path/api/renders/:id/telemetry
MethodGET
AuthYes
DescriptionPerformance telemetry: peak dBFS, real-time factor, click count.
Path/api/renders/:id/provenance
MethodGET
AuthYes
DescriptionProvenance data: commit SHA, score hash, WAV hash, engine config.
Path/ws
PurposeSingle-user note playback with real-time audio streaming.

The live WebSocket accepts note-on and note-off messages and streams PCM audio blocks back to the client. The cockpit UI’s Live tab uses this endpoint.

Path/ws/jam
PurposeMulti-user collaborative sessions with recording.

The jam WebSocket uses a structured JSON protocol. See the Cockpit and Jams page for the full protocol table and session lifecycle.

API errors return a JSON object with an ok: false field and error details. Structured errors include a machine-readable code and a message:

{
"ok": false,
"code": "PRESET_NOT_FOUND",
"message": "Preset 'presets/unknown' not found",
"available": ["default-voice", "bright-lab", "kokoro-af-heart"]
}

General validation errors return a simpler shape:

{
"ok": false,
"error": "Missing score"
}
FieldTypeDescription
okbooleanAlways false on errors
codestringMachine-readable error code (present on structured errors)
messagestringHuman-readable description (present on structured errors)
errorstringError message (present on general validation errors)

HTTP status codes follow standard conventions: 400 for bad requests, 401 for missing/invalid auth, 404 for not found, 429 for rate limited, 500 for server errors.

VariableDefaultDescription
AUTH_TOKEN(unset)Optional bearer token to protect API endpoints
PORT4321Server port
RATE_LIMIT_RPM20Max render requests per minute per IP
MAX_RENDER_DURATION_SEC60Maximum allowed score duration in seconds