OC-301i · Module 3

Capacity Planning

3 min read

Capacity planning for agent systems answers: how much infrastructure do you need to handle expected load at acceptable latency and quality? The answer depends on three inputs: expected task volume (tasks per hour by type), per-task resource requirements (API calls, compute time, memory), and service level objectives (maximum latency, minimum quality scores).

The planning model: for each task type, multiply the expected volume by the per-task resource requirement. Sum across all task types to get total resource demand. Compare against available capacity — API rate limits, compute nodes, memory. If demand exceeds capacity, the choices are: add capacity (more infrastructure, higher API tier), reduce demand (deprioritize low-value tasks, increase caching), or relax SLOs (accept higher latency or lower quality on non-critical tasks). Capacity planning is not a one-time exercise. Review monthly as task volumes change, new task types are added, and model pricing evolves.

Do This

  • Plan capacity based on projected peak demand, not average demand — peaks cause failures, averages do not
  • Include headroom: provision at 70% of maximum capacity — the remaining 30% absorbs spikes
  • Review capacity monthly — task volumes change, new tasks are added, and pricing evolves

Avoid This

  • Plan for average load and hope peaks do not happen — they will, and the system will degrade
  • Provision at 100% utilization — zero headroom means every spike causes latency degradation or dropped tasks
  • Set capacity once and forget it — the load profile six months from now will not match today's