MP-301c · Module 2
Queue Management & Backpressure
3 min read
Queues decouple tool call acceptance from execution. When a tool call arrives, the server adds it to a queue and acknowledges receipt immediately. A worker processes the queue at a controlled rate, executing tool calls in order with a bounded concurrency. This prevents overload: if tool calls arrive faster than the server can process them, the queue absorbs the burst. If the queue fills up, the server rejects new calls with a "server busy" error instead of degrading performance for everyone.
Backpressure is the mechanism that tells upstream senders to slow down. In MCP, backpressure manifests as intentional error responses: "Server at capacity, 3 tool calls queued ahead of you, estimated wait 2 seconds." The LLM reads this and can decide to wait, reduce the number of parallel calls, or switch to a different approach. Without backpressure, the server accepts everything, memory grows unbounded, latency spikes for all clients, and eventually the process crashes. Backpressure trades individual request rejection for system stability.
// Bounded work queue with backpressure
class WorkQueue {
private queue: (() => Promise<void>)[] = [];
private running = 0;
constructor(
private maxConcurrency: number,
private maxQueueSize: number,
) {}
async enqueue<T>(fn: () => Promise<T>): Promise<T> {
if (this.queue.length >= this.maxQueueSize) {
throw new BackpressureError(
`Server at capacity. ${this.queue.length} operations queued, ` +
`${this.running} running. Retry after 2 seconds.`
);
}
return new Promise<T>((resolve, reject) => {
this.queue.push(async () => {
try {
resolve(await fn());
} catch (err) {
reject(err);
}
});
this.drain();
});
}
private async drain() {
while (this.running < this.maxConcurrency && this.queue.length > 0) {
const task = this.queue.shift()!;
this.running++;
task().finally(() => {
this.running--;
this.drain();
});
}
}
get stats() {
return {
queued: this.queue.length,
running: this.running,
capacity: this.maxConcurrency,
queueCapacity: this.maxQueueSize,
};
}
}
class BackpressureError extends Error {
constructor(message: string) {
super(message);
this.name = "BackpressureError";
}
}
// Wrap tool handler with queue
const workQueue = new WorkQueue(5, 20);
async function queuedHandler(args: unknown) {
return workQueue.enqueue(() => expensiveToolHandler(args));
}
- Set queue bounds Choose maxConcurrency based on your server's resource limits (CPU, connections, memory). Choose maxQueueSize based on acceptable wait time — 20 items at 200ms each means 4 seconds max wait.
- Return actionable backpressure errors When the queue is full, tell the LLM how many items are queued, the estimated wait, and when to retry. This is a prompt — make it specific.
- Monitor queue metrics Track queue depth, drain rate, and rejection count over time. Alert when rejection rate exceeds 5% — this means sustained overload, not just bursts.