OC-301i · Module 1

Throughput Engineering

3 min read

Throughput is the number of tasks completed per unit of time. Latency optimization makes individual tasks faster. Throughput optimization makes the system process more tasks concurrently. They are different levers — optimizing one does not necessarily improve the other.

Throughput is constrained by: API rate limits (the LLM provider caps requests per minute), agent concurrency (how many agents can process tasks simultaneously), queue capacity (how many tasks can wait without dropping), and downstream bottlenecks (if the output delivery system can only send 10 emails per minute, processing 100 tasks per minute creates a bottleneck at delivery). The throughput optimization strategy: identify the binding constraint, increase its capacity, then identify the next binding constraint. This is the Theory of Constraints applied to agent systems. Optimizing a non-binding constraint produces zero throughput improvement.

// Throughput is limited by the tightest constraint
interface SystemCapacity {
  apiRateLimit: number;     // requests per minute
  agentConcurrency: number; // simultaneous agents
  avgTaskDuration: number;  // seconds per task
  deliveryCapacity: number; // outputs per minute
}

function maxThroughput(cap: SystemCapacity): number {
  const agentThroughput = (cap.agentConcurrency * 60) / cap.avgTaskDuration;
  return Math.min(
    cap.apiRateLimit,       // API constraint
    agentThroughput,        // compute constraint
    cap.deliveryCapacity    // delivery constraint
  ); // tasks per minute
}

// Example: 60 RPM API, 5 agents, 12s avg, 100 delivery/min
// agentThroughput = (5 * 60) / 12 = 25 tasks/min
// maxThroughput = min(60, 25, 100) = 25 (agent-bound)