Architecture
Vocal Synth Engine is structured around a core synthesis pipeline with REST and WebSocket interfaces, wrapped by a browser-based cockpit UI.
System overview
Section titled “System overview” ┌─── Cockpit UI (browser SPA) ───┐ │ Piano Roll │ Live │ Renders │ └──────────────┴────────┴─────────┘ │ │ REST API WebSocket │ /ws /ws/jam ┌──────────┴────────┴─────────────┐ │ Express Server │ │ Render API │ Jam Sessions │ └──────┬──────┴───────┬────────────┘ │ │ StreamingVocalSynthEngine │ LiveSynthEngine ─────────┘ │ ┌────────────┼─────────────┐ VoicePreset DSP (FFT) Curves (ADSR, (.f32 blobs) Pitch Det. vibrato, automation)Additive synthesis design
Section titled “Additive synthesis design”The engine generates singing voices by combining three signal components:
- Harmonic partials — A bank of sine oscillators at integer multiples of the fundamental frequency. Each partial has its own amplitude envelope derived from the voice preset.
- Spectral envelope — A smooth curve that shapes the relative amplitudes of the harmonics, encoding formant structure (vowel identity).
- Noise residual — Broadband noise filtered through the spectral envelope, adding breathiness and consonant-like texture.
These three components are summed per sample block to produce the final audio signal. The engine processes audio in fixed-size blocks for consistent real-time performance.
Signal flow
Section titled “Signal flow”For each audio block:
- Note scheduling resolves which notes are active based on transport position or live input
- Voice allocation assigns notes to polyphonic voice slots, using voice stealing when the limit is exceeded
- Per-voice rendering computes ADSR envelope, vibrato LFO, pitch curve (portamento), and the three synthesis components
- Mixing sums all active voices into the output buffer
- Streaming sends the buffer over WebSocket (live mode) or appends to the WAV accumulator (offline render)
Curves and modulation
Section titled “Curves and modulation”- ADSR — Attack, decay, sustain, release envelope applied to each note
- Vibrato — Sinusoidal pitch modulation with configurable rate, depth, and onset delay
- Portamento — Smooth pitch glide between consecutive notes
- Automation — Per-note parameter curves (timbre, breathiness) interpolated per block
Directory layout
Section titled “Directory layout”| Directory | Purpose |
|---|---|
src/engine/ | Core synth: block renderer, streaming engine, ADSR, vibrato curves |
src/dsp/ | Signal processing: FFT, pitch detection |
src/preset/ | VoicePreset schema, loader, and resolver |
src/server/ | Express + WebSocket API server, jam session manager |
src/types/ | Shared TypeScript types: scores, jam protocol, presets |
src/cli/ | CLI tools and integration test suites |
apps/cockpit/ | Browser cockpit UI (Vite + vanilla TS) |
presets/ | 15 bundled voice presets with binary timbre data |
Determinism
Section titled “Determinism”The engine uses a seeded pseudo-random number generator for any stochastic elements (noise generation, jitter). Given the same seed, score, and preset, the engine produces bit-identical output. Renders include provenance metadata: commit SHA, score hash, WAV hash, and full configuration.