ThrottleAI Handbook
ThrottleAI is a zero-dependency governor for concurrency, rate, and token budgets. It sits between your code and the model call, enforcing hard limits so stampedes never happen.
This handbook covers everything you need to go from first install to production deployment.
What you will learn
Section titled “What you will learn”- Getting Started — Install, run the 60-second quickstart, and choose the right limiter for your workload.
- Patterns — Server 429 vs queue, interactive vs background priority, streaming calls, and observability.
- Configuration — Presets (quiet / balanced / aggressive), full config reference, and the tuning decision tree.
- API Reference — Every function, type, and return shape in the public API.
- Adapters — Drop-in wrappers for fetch, OpenAI, tool calls, Express, and Hono.
- Reference — Troubleshooting, testing, stability promise, security posture, and examples.
- Beginners Guide — New to rate limiting? Start here for a step-by-step introduction.
Core idea
Section titled “Core idea”AI applications hit rate limits, blow budgets, and create stampedes. ThrottleAI prevents all three with a lease-based model: callers acquire a lease before making a call, then release it when done. No lease, no call. Leases auto-expire if you forget to release.
The governor tracks three independent dimensions:
| Dimension | What it caps |
|---|---|
| Concurrency | Simultaneous in-flight calls (weighted slots + interactive reserve) |
| Rate | Requests per minute and tokens per minute (rolling windows) |
| Fairness | Per-actor share of capacity (prevents monopolization) |
All three are optional and composable. Start with concurrency alone — it handles most workloads.
Design principles
Section titled “Design principles”- Zero dependencies. Pure TypeScript. Ships as ESM + CJS. Runs in Node.js 18+ or any fetch-capable runtime.
- Tree-shakeable. Import only the adapters you use.
- Lease-based, not queue-based. Callers get an immediate yes/no decision. No hidden queues, no unbounded memory growth.
- Observable.
snapshot()gives a point-in-time view.onEventstreams every acquire, deny, release, and expiry.formatEvent()andformatSnapshot()produce human-readable one-liners.