Protocol Reference
Wire format
Section titled “Wire format”sonic-runtime communicates over ndjson-stdio-v1 — newline-delimited JSON on stdin (commands) and stdout (responses + events).
Request
Section titled “Request”{"id": 1, "method": "version", "params": {}}id— integer, monotonically increasing, echoed in responsemethod— command nameparams— method-specific parameters (optional)
Response
Section titled “Response”{"id": 1, "result": {"name": "sonic-runtime", "version": "1.0.1", "protocol": "ndjson-stdio-v1"}}Error response
Section titled “Error response”{"id": 2, "error": {"code": "invalid_source", "message": "Asset file not found: /missing.wav", "retryable": false}}The retryable field tells the caller whether retrying the same request might succeed (e.g., a temporarily unavailable device vs. a permanently invalid parameter).
Event (unsolicited)
Section titled “Event (unsolicited)”{"event": "playback_ended", "data": {"handle": "h_000000000001", "reason": "completed"}}Events have no id — they are pushed by the runtime without a prior request.
Commands
Section titled “Commands”version
Section titled “version”Returns runtime identity and protocol version. Used as the handshake — sonic-core hard-fails on protocol mismatch.
load_asset
Section titled “load_asset”{"id": 2, "method": "load_asset", "params": {"asset_ref": "file:///path/to/sound.wav"}}Loads a WAV file into an OpenAL buffer. Returns a handle for subsequent commands.
{"id": 3, "method": "play", "params": {"handle": "h_...", "volume": 0.8, "loop": true, "output_device_id": "..."}}Starts playback. output_device_id is optional — omit for the default device.
stop / pause / resume
Section titled “stop / pause / resume”{"id": 4, "method": "stop", "params": {"handle": "h_..."}}{"id": 5, "method": "seek", "params": {"handle": "h_...", "position_ms": 5000}}set_volume / set_pan
Section titled “set_volume / set_pan”{"id": 6, "method": "set_volume", "params": {"handle": "h_...", "level": 0.5, "fade_ms": 200}}{"id": 7, "method": "set_pan", "params": {"handle": "h_...", "value": -0.3, "ramp_ms": 100}}Volume (level): 0.0-1.0. Pan (value): -1.0 (left) to 1.0 (right). Fade/ramp durations are optional.
get_position / get_duration
Section titled “get_position / get_duration”{"id": 8, "method": "get_position", "params": {"handle": "h_..."}}{"id": 9, "method": "get_duration", "params": {"handle": "h_..."}}Returns position_ms or duration_ms respectively. Duration may be null for streams.
list_devices
Section titled “list_devices”Returns all available audio output devices with their IDs, names, and default status. Device IDs are opaque strings (e.g., openal_0_a1b2c3d4) used for per-playback routing.
set_device
Section titled “set_device”{"id": 10, "method": "set_device", "params": {"device_id": "openal_0_a1b2c3d4"}}synthesize
Section titled “synthesize”{"id": 11, "method": "synthesize", "params": {"engine": "kokoro", "voice": "af_heart", "text": "Hello world", "speed": 1.0}}Runs TTS synthesis and returns a playable handle. engine must be “kokoro”. speed range: 0.5-2.0 (default 1.0). The result includes handle, duration_ms, sample_rate, and channels.
Introspection commands
Section titled “Introspection commands”| Method | Description |
|---|---|
get_health | Uptime, active handles, model loaded status, voices count, eSpeak availability |
get_capabilities | Supported engines, features, protocol version, synthesis audio format |
list_voices | All loaded voice IDs with language and gender metadata |
preload_model | Force-load the ONNX model (normally lazy-loaded on first synthesis) |
get_model_status | Whether model is loaded, path, load time, inference count |
validate_assets | Check all synthesis assets (model, voices, eSpeak, ONNX Runtime) with actionable hints |
shutdown | Graceful exit |
Events
Section titled “Events”| Event | Data | When |
|---|---|---|
playback_ended | handle, reason | Playback completed naturally (“completed”) or was stopped (“stopped”) |
synthesis_started | handle, engine, voice | TTS pipeline began |
synthesis_completed | handle, duration_ms, inference_ms | TTS inference finished successfully |
synthesis_failed | handle, code, message | TTS inference failed |
Error codes
Section titled “Error codes”| Code | Retryable | Description |
|---|---|---|
invalid_params | no | Missing or malformed parameters |
method_not_found | no | Unknown method name |
playback_not_found | no | Handle does not exist |
device_unavailable | yes | Requested device not found or unplugged |
seek_unsupported | no | Cannot seek this source type |
invalid_source | no | OpenAL error or asset file not found |
unsupported_format | no | Audio format not supported (non-PCM WAV, bad bit depth) |
synthesis_validation_failed | no | Bad engine, voice, text, or speed value |
synthesis_voice_not_found | no | Requested voice ID not loaded |
synthesis_model_missing | no | ONNX model file not found |
synthesis_model_load_failed | no | ONNX model failed to load |
synthesis_inference_failed | yes | ONNX inference error or empty output |
synthesis_not_configured | no | Synthesis engine not available |
internal_error | no | Unexpected runtime error |
Diagnostic output
Section titled “Diagnostic output”All human-readable diagnostic messages go to stderr, prefixed with [sonic-runtime]. stdout is reserved exclusively for protocol JSON.