MP-301c · Module 2

Queue Management & Backpressure

3 min read

Queues decouple tool call acceptance from execution. When a tool call arrives, the server adds it to a queue and acknowledges receipt immediately. A worker processes the queue at a controlled rate, executing tool calls in order with a bounded concurrency. This prevents overload: if tool calls arrive faster than the server can process them, the queue absorbs the burst. If the queue fills up, the server rejects new calls with a "server busy" error instead of degrading performance for everyone.

Backpressure is the mechanism that tells upstream senders to slow down. In MCP, backpressure manifests as intentional error responses: "Server at capacity, 3 tool calls queued ahead of you, estimated wait 2 seconds." The LLM reads this and can decide to wait, reduce the number of parallel calls, or switch to a different approach. Without backpressure, the server accepts everything, memory grows unbounded, latency spikes for all clients, and eventually the process crashes. Backpressure trades individual request rejection for system stability.

// Bounded work queue with backpressure
class WorkQueue {
  private queue: (() => Promise<void>)[] = [];
  private running = 0;

  constructor(
    private maxConcurrency: number,
    private maxQueueSize: number,
  ) {}

  async enqueue<T>(fn: () => Promise<T>): Promise<T> {
    if (this.queue.length >= this.maxQueueSize) {
      throw new BackpressureError(
        `Server at capacity. ${this.queue.length} operations queued, ` +
        `${this.running} running. Retry after 2 seconds.`
      );
    }

    return new Promise<T>((resolve, reject) => {
      this.queue.push(async () => {
        try {
          resolve(await fn());
        } catch (err) {
          reject(err);
        }
      });
      this.drain();
    });
  }

  private async drain() {
    while (this.running < this.maxConcurrency && this.queue.length > 0) {
      const task = this.queue.shift()!;
      this.running++;
      task().finally(() => {
        this.running--;
        this.drain();
      });
    }
  }

  get stats() {
    return {
      queued: this.queue.length,
      running: this.running,
      capacity: this.maxConcurrency,
      queueCapacity: this.maxQueueSize,
    };
  }
}

class BackpressureError extends Error {
  constructor(message: string) {
    super(message);
    this.name = "BackpressureError";
  }
}

// Wrap tool handler with queue
const workQueue = new WorkQueue(5, 20);

async function queuedHandler(args: unknown) {
  return workQueue.enqueue(() => expensiveToolHandler(args));
}
  1. Set queue bounds Choose maxConcurrency based on your server's resource limits (CPU, connections, memory). Choose maxQueueSize based on acceptable wait time — 20 items at 200ms each means 4 seconds max wait.
  2. Return actionable backpressure errors When the queue is full, tell the LLM how many items are queued, the estimated wait, and when to retry. This is a prompt — make it specific.
  3. Monitor queue metrics Track queue depth, drain rate, and rejection count over time. Alert when rejection rate exceeds 5% — this means sustained overload, not just bursts.