Skip to content

Adapters

Adapters are tree-shakeable wrappers that integrate ThrottleAI with common tools and frameworks. Import only what you use. Each adapter handles acquire, release, outcome reporting, and latency tracking automatically.

All adapters return a consistent shape:

// Granted
{ ok: true, result: T, latencyMs: number }
// Denied
{ ok: false, decision: AcquireDecision }
AdapterImportAuto-reports
fetch@mcptoolshop/throttleai/adapters/fetchoutcome (from HTTP status) + latency
OpenAI@mcptoolshop/throttleai/adapters/openaioutcome + latency + token usage
Tool@mcptoolshop/throttleai/adapters/toolsoutcome + latency + custom weight
Express@mcptoolshop/throttleai/adapters/expressoutcome (from res.statusCode) + latency
Hono@mcptoolshop/throttleai/adapters/honooutcome + latency

Wraps any fetch-compatible function with governor-controlled leases. The outcome is automatically derived from the HTTP status code.

import { wrapFetch } from "@mcptoolshop/throttleai/adapters/fetch";
const throttledFetch = wrapFetch(fetch, { governor: gov });
const r = await throttledFetch("https://api.example.com/v1/chat");
if (r.ok) {
console.log(r.response.status); // the original Response
} else {
console.log("Denied:", r.decision.retryAfterMs);
}
  • governor — the governor instance (required)
  • actorId — default actor ID for all requests (default: "default")
  • priority — default priority (default: "interactive")
  • classifyAction — function to derive action from the request (default: URL pathname)
  • estimate — function to provide a token estimate for the request
HTTP StatusOutcome
200-399"success"
400-499"error"
500-599"error"
Network error"error"

Wraps an OpenAI-compatible chat.completions.create function. Automatically reports token usage from the response.

import { wrapChatCompletions } from "@mcptoolshop/throttleai/adapters/openai";
const chat = wrapChatCompletions(
(params) => openai.chat.completions.create(params),
{ governor: gov },
);
const r = await chat({
model: "gpt-4",
messages: [{ role: "user", content: "Hello" }],
});
if (r.ok) {
console.log(r.result.choices[0].message.content);
console.log("Tokens used:", r.result.usage?.total_tokens);
}
  • Outcome: "success" if the call completes, "error" on exception
  • Latency: wall-clock time of the API call
  • Token usage: extracted from response.usage.total_tokens if present

This means the governor’s token-rate limiter stays accurate without you manually tracking tokens.

The OpenAI adapter exports two utility functions for rough token estimation:

import {
estimateTokensFromChars,
estimateTokensFromMessages,
} from "@mcptoolshop/throttleai/adapters/openai";
// ~4 chars per token heuristic
const tokens = estimateTokensFromChars(2000); // 500
// Sum message content + per-message overhead
const promptTokens = estimateTokensFromMessages([
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello" },
]);

These are intentionally simple estimates. For accurate counts, use a real tokenizer (tiktoken) and pass the result as the estimate parameter.

Wraps any async function as a governed tool call. Useful for MCP tools, embedding functions, or any custom async work.

import { wrapTool } from "@mcptoolshop/throttleai/adapters/tools";
const embed = wrapTool(myEmbedFn, {
governor: gov,
toolId: "embed",
costWeight: 2, // this tool uses 2 concurrency slots
});
const r = await embed("hello world");
if (r.ok) {
console.log(r.result); // the embedding vector
}
  • governor — the governor instance (required)
  • toolId — identifier for this tool (used as the action in acquire requests)
  • costWeight — concurrency weight per call (default: 1). Heavier tools can consume multiple slots.
  • actorId — default actor ID

The costWeight option is particularly useful when different tools have different resource costs. An embedding call that hits a GPU might cost 2 slots while a simple metadata lookup costs 1.

Middleware for Express that automatically governs incoming requests. Denied requests receive a 429 response with a Retry-After header.

import { throttleMiddleware } from "@mcptoolshop/throttleai/adapters/express";
app.use("/ai", throttleMiddleware({ governor: gov }));

When the governor denies a request, the middleware responds with:

  • Status: 429 Too Many Requests
  • Header: Retry-After (in seconds, derived from retryAfterMs)
  • Body: JSON with the deny reason, recommendation, and retry timing
{
"error": "throttled",
"reason": "concurrency",
"retryAfterMs": 500,
"recommendation": "All 5 slots in use. Try again in ~500ms."
}
  • governor — the governor instance (required)
  • getActorId — function to extract actor ID from request (default: x-actor-id header, then req.ip, then "anonymous")
  • getAction — function to extract action from request (default: req.path)
  • getPriority — function to extract priority from request (default: "interactive")
  • getEstimate — function to derive a token estimate from the request
  • onDeny — custom handler for denied requests (default: 429 JSON response)

The middleware reports outcomes based on res.statusCode after the handler completes:

Status CodeOutcome
< 400"success"
>= 400"error"

Middleware for the Hono framework, designed for edge-compatible runtimes.

import { throttle } from "@mcptoolshop/throttleai/adapters/hono";
app.use("/ai/*", throttle({ governor: gov }));
  • Denied requests return 429 JSON with the same shape as the Express adapter.
  • The leaseId is stored on the Hono context, allowing downstream handlers to access it if needed.
  • Outcomes are reported automatically from the response status.
  • governor — the governor instance (required)
  • getActorId — function to extract actor ID from context (default: x-actor-id header or "anonymous")
  • getAction — function to extract action from context (default: req.path)
  • getPriority — function to extract priority from context (default: "interactive")
  • getEstimate — function to derive a token estimate from the context
  • onDeny — custom handler for denied requests (return a Response to override the default 429 JSON)

If your framework or client is not covered by the built-in adapters, the pattern is straightforward:

async function myAdapter(gov, request, fn) {
const decision = gov.acquire({
actorId: request.actorId,
action: request.action,
});
if (!decision.granted) {
return { ok: false, decision };
}
const start = Date.now();
try {
const result = await fn();
gov.release(decision.leaseId, {
outcome: "success",
latencyMs: Date.now() - start,
});
return { ok: true, result, latencyMs: Date.now() - start };
} catch (err) {
gov.release(decision.leaseId, {
outcome: "error",
latencyMs: Date.now() - start,
});
throw err;
}
}

The key contract: acquire before, release after, always release on error, and report the outcome.