CDX-301i · Module 1
Agent Pools & Capacity Planning
3 min read
An agent pool is a managed set of agent instances with defined capabilities, capacity limits, and lifecycle management. Instead of creating agents ad-hoc, the pool pre-provisions agents of each type and manages their availability. When the dispatcher needs a code review agent, it draws from the review pool. When the agent completes its task, it returns to the pool. This pattern — borrowed from thread pools and connection pools in systems engineering — eliminates the startup latency of agent creation and provides predictable capacity.
Capacity planning for agent pools requires balancing three variables: cost (more agents = higher API spend), throughput (more agents = faster task processing), and latency (more agents = shorter queue wait times). The right pool size depends on your task arrival rate, average task duration, and SLA requirements. A pool that is too small creates a growing backlog; a pool that is too large wastes API budget on idle agents. Auto-scaling — adding agents when queue depth exceeds a threshold and removing them when the queue is empty — optimizes the cost-throughput tradeoff dynamically.
from dataclasses import dataclass
from agents import Agent
@dataclass
class PoolConfig:
agent_type: str # "reviewer", "implementer", "tester"
model: str # "o3", "codex-1", "gpt-4.1"
min_instances: int # Always available
max_instances: int # Scale-up ceiling
scale_up_threshold: int # Queue depth to trigger scale-up
scale_down_idle_secs: int # Idle time before scale-down
class AgentPool:
def __init__(self, config: PoolConfig):
self.config = config
self.agents: list[Agent] = []
self._init_min_instances()
def _init_min_instances(self):
for i in range(self.config.min_instances):
self.agents.append(Agent(
name=f"{self.config.agent_type}-{i}",
model=self.config.model,
))
def acquire(self) -> Agent | None:
idle = [a for a in self.agents if not a.busy]
if idle:
return idle[0]
if len(self.agents) < self.config.max_instances:
return self._scale_up()
return None # Pool exhausted
def release(self, agent: Agent):
agent.busy = False # Return to pool
Do This
- Pre-provision minimum instances to eliminate cold-start latency for common task types
- Set max instance limits to cap costs — auto-scaling without a ceiling is a budget risk
- Monitor pool utilization — consistently over 80% means the pool is undersized
- Size pools based on measured task arrival rates, not estimates — instrument before sizing
Avoid This
- Create agents on-demand for every task — the startup latency adds up at scale
- Set minimum instances too high — idle agents cost money without producing value
- Ignore pool metrics — an undersized pool silently degrades SLA compliance
- Use one pool for all task types — different tasks need different models and capabilities