Configuration
Presets
Section titled “Presets”ThrottleAI ships three presets that cover the most common scenarios. Each preset returns a config object that you can spread and override.
Single user, CLI tools — 1 call at a time, 10 requests per minute.
import { createGovernor, presets } from "@mcptoolshop/throttleai";
createGovernor(presets.quiet());balanced
Section titled “balanced”SaaS backend — 5 concurrent (2 interactive reserve), 60 requests per minute, fairness enabled.
createGovernor(presets.balanced());aggressive
Section titled “aggressive”Batch processing — 20 concurrent, 300 requests per minute, fairness + adaptive tuning.
createGovernor(presets.aggressive());Override any field
Section titled “Override any field”Presets are plain objects. Spread and override:
createGovernor({ ...presets.balanced(), leaseTtlMs: 30_000, concurrency: { maxInFlight: 10, interactiveReserve: 3 },});Full configuration reference
Section titled “Full configuration reference”createGovernor({ // Concurrency (optional) concurrency: { maxInFlight: 5, // max simultaneous weight interactiveReserve: 1, // slots reserved for interactive priority },
// Rate limiting (optional) rate: { requestsPerMinute: 60, // request-rate cap tokensPerMinute: 100_000, // token-rate cap windowMs: 60_000, // rolling window (default 60s) },
// Fairness (optional) fairness: true, // enable with defaults // or: fairness: { softCapRatio: 0.6, // max share of capacity per actor (default 0.6) starvationWindowMs: 5_000, // denied actors get priority (default 5s) },
// Adaptive tuning (optional) adaptive: true, // enable with defaults // or: adaptive: { targetDenyRate: 0.05, // target deny ratio (default 0.05) latencyThreshold: 1.5, // EMA ratio that triggers reduction (default 1.5) alpha: 0.2, // EMA smoothing factor (default 0.2) adjustIntervalMs: 5_000, // how often to re-evaluate (default 5s) minConcurrency: 1, // floor for effective concurrency (default 1) },
// Strict mode (optional) strict: true, // throw on double release / unknown ID (dev mode)
// Lease settings leaseTtlMs: 60_000, // auto-expire (default 60s) reaperIntervalMs: 5_000, // sweep interval (default 5s)
// Observability onEvent: (e) => { /* acquire, deny, release, expire, warn */ },});Configuration options in detail
Section titled “Configuration options in detail”Concurrency
Section titled “Concurrency”| Option | Type | Default | Description |
|---|---|---|---|
maxInFlight | number | required | Maximum simultaneous in-flight weight. This is the most important setting. |
interactiveReserve | number | 0 | Slots reserved exclusively for priority: "interactive" requests. Background requests are denied when available slots drop to this level. |
| Option | Type | Default | Description |
|---|---|---|---|
requestsPerMinute | number | - | Maximum requests per rolling window. |
tokensPerMinute | number | - | Maximum tokens per rolling window. |
windowMs | number | 60_000 | Rolling window duration in milliseconds. |
Fairness
Section titled “Fairness”| Option | Type | Default | Description |
|---|---|---|---|
softCapRatio | number | 0.6 | Maximum share of maxInFlight any single actor can hold. With maxInFlight: 20 and softCapRatio: 0.3, each actor caps at 6 slots. |
starvationWindowMs | number | 5_000 | Actors denied within this window get a priority pass when slots free up. |
Adaptive
Section titled “Adaptive”| Option | Type | Default | Description |
|---|---|---|---|
targetDenyRate | number | 0.05 | Target deny ratio. Higher values allow more throughput but more denials. |
latencyThreshold | number | 1.5 | EMA ratio that triggers concurrency reduction. Lower values react faster to latency spikes. |
alpha | number | 0.2 | EMA smoothing factor. Lower values produce smoother, slower-reacting signals. |
adjustIntervalMs | number | 5_000 | How often the adaptive controller re-evaluates. |
minConcurrency | number | 1 | Floor for effective concurrency. Adaptive never reduces below this. |
Lease settings
Section titled “Lease settings”| Option | Type | Default | Description |
|---|---|---|---|
leaseTtlMs | number | 60_000 | Leases auto-expire after this duration. Set to just above your expected p99 latency. |
reaperIntervalMs | number | 5_000 | How often the reaper sweeps for expired leases. |
Decision tree
Section titled “Decision tree”Use this to decide which limiters to enable:
Is your app user-facing?+-- YES --> Set interactiveReserve >= 1, consider adaptive: true+-- NO --> Skip interactiveReserve
Does the upstream have a rate limit?+-- YES --> Set requestsPerMinute to match (leave 10-20% headroom)+-- NO --> Skip rate config
Do you have multiple actors/users?+-- YES --> Enable fairness: true (or tune softCapRatio)+-- NO --> Skip fairness
Is latency unpredictable?+-- YES --> Enable adaptive: true+-- NO --> Skip adaptive (manual tuning is simpler)When adaptive helps vs hurts
Section titled “When adaptive helps vs hurts”Adaptive helps when:
- Upstream latency is variable (cloud APIs, shared GPUs)
- You do not know the right concurrency up front
- Load patterns change throughout the day
Adaptive hurts when:
- Latency is constant (local model with fixed batch size)
- You know the exact capacity (you own the hardware)
- Traffic is bursty and low-volume (not enough samples for a good EMA)
If adaptive oscillates, increase adjustIntervalMs (slower reactions) or lower alpha (smoother signal).
Tuning quick reference
Section titled “Tuning quick reference”| You see this | Adjust this |
|---|---|
reason: "concurrency" | Increase maxInFlight or decrease call duration |
reason: "rate" | Increase requestsPerMinute / tokensPerMinute |
reason: "policy" (fairness) | Lower softCapRatio or increase maxInFlight |
High retryAfterMs | Reduce leaseTtlMs so expired leases free faster |
| Background tasks starved | Increase maxInFlight or reduce interactiveReserve |
| Interactive latency high | Increase interactiveReserve |
| Adaptive shrinks too fast | Lower alpha or raise targetDenyRate |