CDX-301i · Module 1

Agent Pools & Capacity Planning

3 min read

An agent pool is a managed set of agent instances with defined capabilities, capacity limits, and lifecycle management. Instead of creating agents ad-hoc, the pool pre-provisions agents of each type and manages their availability. When the dispatcher needs a code review agent, it draws from the review pool. When the agent completes its task, it returns to the pool. This pattern — borrowed from thread pools and connection pools in systems engineering — eliminates the startup latency of agent creation and provides predictable capacity.

Capacity planning for agent pools requires balancing three variables: cost (more agents = higher API spend), throughput (more agents = faster task processing), and latency (more agents = shorter queue wait times). The right pool size depends on your task arrival rate, average task duration, and SLA requirements. A pool that is too small creates a growing backlog; a pool that is too large wastes API budget on idle agents. Auto-scaling — adding agents when queue depth exceeds a threshold and removing them when the queue is empty — optimizes the cost-throughput tradeoff dynamically.

from dataclasses import dataclass
from agents import Agent

@dataclass
class PoolConfig:
    agent_type: str          # "reviewer", "implementer", "tester"
    model: str               # "o3", "codex-1", "gpt-4.1"
    min_instances: int       # Always available
    max_instances: int       # Scale-up ceiling
    scale_up_threshold: int  # Queue depth to trigger scale-up
    scale_down_idle_secs: int  # Idle time before scale-down

class AgentPool:
    def __init__(self, config: PoolConfig):
        self.config = config
        self.agents: list[Agent] = []
        self._init_min_instances()

    def _init_min_instances(self):
        for i in range(self.config.min_instances):
            self.agents.append(Agent(
                name=f"{self.config.agent_type}-{i}",
                model=self.config.model,
            ))

    def acquire(self) -> Agent | None:
        idle = [a for a in self.agents if not a.busy]
        if idle:
            return idle[0]
        if len(self.agents) < self.config.max_instances:
            return self._scale_up()
        return None  # Pool exhausted

    def release(self, agent: Agent):
        agent.busy = False  # Return to pool

Do This

  • Pre-provision minimum instances to eliminate cold-start latency for common task types
  • Set max instance limits to cap costs — auto-scaling without a ceiling is a budget risk
  • Monitor pool utilization — consistently over 80% means the pool is undersized
  • Size pools based on measured task arrival rates, not estimates — instrument before sizing

Avoid This

  • Create agents on-demand for every task — the startup latency adds up at scale
  • Set minimum instances too high — idle agents cost money without producing value
  • Ignore pool metrics — an undersized pool silently degrades SLA compliance
  • Use one pool for all task types — different tasks need different models and capabilities