Beginners Guide
What is ThrottleAI?
Section titled “What is ThrottleAI?”ThrottleAI is a lightweight TypeScript library that controls how many AI calls your application makes at the same time. It acts as a traffic cop between your code and AI model APIs (OpenAI, Anthropic, Ollama, or any HTTP endpoint), preventing stampedes that blow budgets, trigger rate limits, or crash local GPUs.
ThrottleAI uses a lease-based model: before making an AI call, your code requests a lease from a governor. The governor checks concurrency, rate limits, and fairness rules, then returns an immediate yes or no. If granted, you make the call and release the lease when done. If denied, you get a recommendation for when to retry.
The library has zero dependencies, runs in Node.js 18+, and operates entirely in memory — no network calls, no telemetry, no persistent state.
Who is this for?
Section titled “Who is this for?”ThrottleAI is for any TypeScript or JavaScript developer who calls AI APIs and needs to control:
- Concurrency — how many calls run at the same time
- Rate limits — how many requests per minute the upstream API allows
- Fairness — preventing one user from hogging all capacity in multi-tenant apps
- Cost — staying within token budgets to avoid surprise bills
If you have ever seen a 429 Too Many Requests error from an AI provider, or had a local model run out of memory from too many simultaneous requests, ThrottleAI solves that problem.
Key concepts
Section titled “Key concepts”Governor
Section titled “Governor”The governor is the central object. You create one with createGovernor() and it tracks all active leases, rate windows, and fairness state. Most applications need exactly one governor.
A lease is a permit to make one AI call. You acquire a lease before calling the model and release it when done. Leases auto-expire after a configurable timeout (default: 60 seconds) as a safety net.
Presets
Section titled “Presets”ThrottleAI ships three presets that cover common scenarios:
- quiet — 1 concurrent call, 10 requests/min. For CLI tools and single-user apps.
- balanced — 5 concurrent calls with 2 reserved for interactive users, 60 requests/min, fairness enabled. For SaaS backends.
- aggressive — 20 concurrent calls, 300 requests/min, fairness + adaptive tuning. For batch processing.
Adapters
Section titled “Adapters”Adapters are optional wrappers for popular frameworks (fetch, OpenAI SDK, Express, Hono). They handle acquire/release automatically so you do not need to manage leases manually.
Prerequisites
Section titled “Prerequisites”- Node.js 18 or later — ThrottleAI uses modern JavaScript features.
- TypeScript recommended — The library is written in TypeScript and provides full type definitions. JavaScript works too, but you lose type safety.
- A package manager: npm, pnpm, or yarn.
Step-by-step tutorial
Section titled “Step-by-step tutorial”1. Install
Section titled “1. Install”npm install @mcptoolshop/throttleai2. Create a governor
Section titled “2. Create a governor”import { createGovernor, presets } from "@mcptoolshop/throttleai";
// Start with the quiet preset for learningconst gov = createGovernor(presets.quiet());The quiet preset allows 1 concurrent call and 10 requests per minute. This is the safest starting point.
3. Make a governed call with withLease
Section titled “3. Make a governed call with withLease”withLease is the recommended way to use ThrottleAI. It handles acquire and release automatically, including on errors.
import { createGovernor, withLease, presets } from "@mcptoolshop/throttleai";
const gov = createGovernor(presets.quiet());
// Simulate an AI callasync function callMyModel(prompt: string): Promise<string> { // Replace this with your actual AI SDK call return `Response to: ${prompt}`;}
const result = await withLease( gov, { actorId: "my-app", action: "chat" }, async () => await callMyModel("Hello, world!"),);
if (result.granted) { console.log("Got a response:", result.result);} else { console.log("Throttled! Retry in", result.decision.retryAfterMs, "ms"); console.log("Reason:", result.decision.reason);}What happens here:
withLeasecallsgov.acquire()to request a lease.- If granted, it runs your function and then calls
gov.release()automatically. - If denied, it returns the denial with a reason and retry recommendation.
- If your function throws an error, the lease is still released (with outcome
"error").
4. Add observability
Section titled “4. Add observability”To see what the governor is doing, add an event handler:
import { createGovernor, formatEvent, presets } from "@mcptoolshop/throttleai";
const gov = createGovernor({ ...presets.quiet(), onEvent: (e) => console.log(formatEvent(e)),});You will see output like:
[acquire] actor=my-app action=chat leaseId=abc123[release] leaseId=abc123 outcome=successIf a call is denied, you will see:
[deny] actor=my-app action=chat reason=concurrency retryAfterMs=5005. Check governor state
Section titled “5. Check governor state”At any time, you can inspect the governor:
import { formatSnapshot } from "@mcptoolshop/throttleai";
console.log(formatSnapshot(gov.snapshot()));// concurrency=0/1 rate=3/10 leases=06. Clean up on shutdown
Section titled “6. Clean up on shutdown”The governor runs a background interval to expire stale leases. Call dispose() when your app shuts down to stop it:
process.on("SIGINT", () => { gov.dispose(); process.exit(0);});Common mistakes
Section titled “Common mistakes”Forgetting to release leases
Section titled “Forgetting to release leases”If you use gov.acquire() directly (instead of withLease), you must always release the lease, even when an error occurs:
const decision = gov.acquire({ actorId: "my-app", action: "chat" });if (!decision.granted) return;
try { const result = await callMyModel(); gov.release(decision.leaseId, { outcome: "success" });} catch (err) { gov.release(decision.leaseId, { outcome: "error" }); throw err;}If you forget the catch branch, leaked leases consume concurrency slots until they expire. Use withLease to avoid this entirely.
Setting maxInFlight too low
Section titled “Setting maxInFlight too low”With maxInFlight: 1 (the quiet preset), only one call runs at a time. This is safe but slow. Once you are comfortable, increase it:
createGovernor({ concurrency: { maxInFlight: 5 },});Not calling dispose
Section titled “Not calling dispose”If you do not call gov.dispose(), the reaper interval keeps your Node.js process alive after your work is done. This is especially noticeable in scripts and tests.
Q: Does ThrottleAI make network calls?
No. ThrottleAI is a pure in-memory library. It does not call any APIs, send telemetry, or access the filesystem. It only tracks state in JavaScript objects.
Q: Can I use ThrottleAI with any AI provider?
Yes. ThrottleAI is provider-agnostic. It governs when calls happen, not how they are made. Use it with OpenAI, Anthropic, Google, Ollama, vLLM, or any HTTP API. The built-in adapters handle OpenAI and fetch patterns automatically, but the core acquire/release API works with anything.
Q: What happens when a lease expires?
The governor fires an expire event and frees the concurrency slot. The in-flight operation continues running — the governor just stops tracking it. Frequent expirations indicate that leaseTtlMs is too short for your workload.
Q: Do I need a separate governor per API?
Not necessarily. One governor can manage all your AI calls. If you call multiple APIs with different rate limits, you may want separate governors — one per API.
Q: What is the difference between withLease and raw acquire/release?
withLease wraps acquire and release with automatic error handling. It is the recommended approach for most use cases. Use raw acquire/release only when you need fine-grained control, such as holding a lease across multiple async steps or streaming responses.