Stop AI stampedes before they start.
Token-based lease governor for AI calls — small enough to embed anywhere, strict enough to enforce real limits on concurrency, tokens, and spend.
Install
npm install throttleai
Govern
import { ThrottleAI } from 'throttleai';
const gov = new ThrottleAI({ rpm: 60, tpm: 100_000 });
await gov.acquire(estimatedTokens);
Wrap
import { withThrottle } from 'throttleai/adapters/openai';
const openai = withThrottle(new OpenAI(), gov);
Features
Governance that actually holds.
Lease-Based Flow
Callers acquire a lease before any call is made. No lease, no call. Stampedes are structurally impossible, not just unlikely.
Token + Rate Aware
Tracks RPM, TPM, and concurrent request counts independently. Enforce all three, any two, or just one — your choice.
Zero Dependencies
Pure TypeScript, ships as ESM + CJS, runs in Node 18+ or any fetch-capable runtime. Nothing to install but the package itself.
Adapters
Drop-in wrappers for the tools you already use.
Usage
Core governor
import { ThrottleAI } from 'throttleai';
const gov = new ThrottleAI({
rpm: 60, // max requests per minute
tpm: 100_000, // max tokens per minute
concurrency: 5, // max in-flight at once
});
// Acquire before every call
const lease = await gov.acquire(estimatedTokens);
const result = await myAICall();
lease.release(actualTokensUsed); OpenAI adapter
import { withThrottle } from 'throttleai/adapters/openai';
const client = withThrottle(new OpenAI(), gov);
// Use exactly like the normal OpenAI client
const res = await client.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Hello' }],
});