Adapters
Adapters are tree-shakeable wrappers that integrate ThrottleAI with common tools and frameworks. Import only what you use. Each adapter handles acquire, release, outcome reporting, and latency tracking automatically.
All adapters return a consistent shape:
// Granted{ ok: true, result: T, latencyMs: number }
// Denied{ ok: false, decision: AcquireDecision }Adapter overview
Section titled “Adapter overview”| Adapter | Import | Auto-reports |
|---|---|---|
| fetch | @mcptoolshop/throttleai/adapters/fetch | outcome (from HTTP status) + latency |
| OpenAI | @mcptoolshop/throttleai/adapters/openai | outcome + latency + token usage |
| Tool | @mcptoolshop/throttleai/adapters/tools | outcome + latency + custom weight |
| Express | @mcptoolshop/throttleai/adapters/express | outcome (from res.statusCode) + latency |
| Hono | @mcptoolshop/throttleai/adapters/hono | outcome + latency |
fetch adapter
Section titled “fetch adapter”Wraps any fetch-compatible function with governor-controlled leases. The outcome is automatically derived from the HTTP status code.
import { wrapFetch } from "@mcptoolshop/throttleai/adapters/fetch";
const throttledFetch = wrapFetch(fetch, { governor: gov });
const r = await throttledFetch("https://api.example.com/v1/chat");if (r.ok) { console.log(r.response.status); // the original Response} else { console.log("Denied:", r.decision.retryAfterMs);}Options
Section titled “Options”governor— the governor instance (required)actorId— default actor ID for all requests (default:"default")priority— default priority (default:"interactive")classifyAction— function to derive action from the request (default: URL pathname)estimate— function to provide a token estimate for the request
Outcome mapping
Section titled “Outcome mapping”| HTTP Status | Outcome |
|---|---|
| 200-399 | "success" |
| 400-499 | "error" |
| 500-599 | "error" |
| Network error | "error" |
OpenAI adapter
Section titled “OpenAI adapter”Wraps an OpenAI-compatible chat.completions.create function. Automatically reports token usage from the response.
import { wrapChatCompletions } from "@mcptoolshop/throttleai/adapters/openai";
const chat = wrapChatCompletions( (params) => openai.chat.completions.create(params), { governor: gov },);
const r = await chat({ model: "gpt-4", messages: [{ role: "user", content: "Hello" }],});
if (r.ok) { console.log(r.result.choices[0].message.content); console.log("Tokens used:", r.result.usage?.total_tokens);}What it auto-reports
Section titled “What it auto-reports”- Outcome:
"success"if the call completes,"error"on exception - Latency: wall-clock time of the API call
- Token usage: extracted from
response.usage.total_tokensif present
This means the governor’s token-rate limiter stays accurate without you manually tracking tokens.
Estimation helpers
Section titled “Estimation helpers”The OpenAI adapter exports two utility functions for rough token estimation:
import { estimateTokensFromChars, estimateTokensFromMessages,} from "@mcptoolshop/throttleai/adapters/openai";
// ~4 chars per token heuristicconst tokens = estimateTokensFromChars(2000); // 500
// Sum message content + per-message overheadconst promptTokens = estimateTokensFromMessages([ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Hello" },]);These are intentionally simple estimates. For accurate counts, use a real tokenizer (tiktoken) and pass the result as the estimate parameter.
Tool adapter
Section titled “Tool adapter”Wraps any async function as a governed tool call. Useful for MCP tools, embedding functions, or any custom async work.
import { wrapTool } from "@mcptoolshop/throttleai/adapters/tools";
const embed = wrapTool(myEmbedFn, { governor: gov, toolId: "embed", costWeight: 2, // this tool uses 2 concurrency slots});
const r = await embed("hello world");if (r.ok) { console.log(r.result); // the embedding vector}Options
Section titled “Options”governor— the governor instance (required)toolId— identifier for this tool (used as theactionin acquire requests)costWeight— concurrency weight per call (default: 1). Heavier tools can consume multiple slots.actorId— default actor ID
The costWeight option is particularly useful when different tools have different resource costs. An embedding call that hits a GPU might cost 2 slots while a simple metadata lookup costs 1.
Express adapter
Section titled “Express adapter”Middleware for Express that automatically governs incoming requests. Denied requests receive a 429 response with a Retry-After header.
import { throttleMiddleware } from "@mcptoolshop/throttleai/adapters/express";
app.use("/ai", throttleMiddleware({ governor: gov }));What happens on deny
Section titled “What happens on deny”When the governor denies a request, the middleware responds with:
- Status: 429 Too Many Requests
- Header:
Retry-After(in seconds, derived fromretryAfterMs) - Body: JSON with the deny reason, recommendation, and retry timing
{ "error": "throttled", "reason": "concurrency", "retryAfterMs": 500, "recommendation": "All 5 slots in use. Try again in ~500ms."}Options
Section titled “Options”governor— the governor instance (required)getActorId— function to extract actor ID from request (default:x-actor-idheader, thenreq.ip, then"anonymous")getAction— function to extract action from request (default:req.path)getPriority— function to extract priority from request (default:"interactive")getEstimate— function to derive a token estimate from the requestonDeny— custom handler for denied requests (default: 429 JSON response)
Outcome mapping
Section titled “Outcome mapping”The middleware reports outcomes based on res.statusCode after the handler completes:
| Status Code | Outcome |
|---|---|
| < 400 | "success" |
| >= 400 | "error" |
Hono adapter
Section titled “Hono adapter”Middleware for the Hono framework, designed for edge-compatible runtimes.
import { throttle } from "@mcptoolshop/throttleai/adapters/hono";
app.use("/ai/*", throttle({ governor: gov }));Behavior
Section titled “Behavior”- Denied requests return 429 JSON with the same shape as the Express adapter.
- The
leaseIdis stored on the Hono context, allowing downstream handlers to access it if needed. - Outcomes are reported automatically from the response status.
Options
Section titled “Options”governor— the governor instance (required)getActorId— function to extract actor ID from context (default:x-actor-idheader or"anonymous")getAction— function to extract action from context (default:req.path)getPriority— function to extract priority from context (default:"interactive")getEstimate— function to derive a token estimate from the contextonDeny— custom handler for denied requests (return a Response to override the default 429 JSON)
Writing a custom adapter
Section titled “Writing a custom adapter”If your framework or client is not covered by the built-in adapters, the pattern is straightforward:
async function myAdapter(gov, request, fn) { const decision = gov.acquire({ actorId: request.actorId, action: request.action, });
if (!decision.granted) { return { ok: false, decision }; }
const start = Date.now(); try { const result = await fn(); gov.release(decision.leaseId, { outcome: "success", latencyMs: Date.now() - start, }); return { ok: true, result, latencyMs: Date.now() - start }; } catch (err) { gov.release(decision.leaseId, { outcome: "error", latencyMs: Date.now() - start, }); throw err; }}The key contract: acquire before, release after, always release on error, and report the outcome.