OC-301i · Module 3
Capacity Planning
3 min read
Capacity planning for agent systems answers: how much infrastructure do you need to handle expected load at acceptable latency and quality? The answer depends on three inputs: expected task volume (tasks per hour by type), per-task resource requirements (API calls, compute time, memory), and service level objectives (maximum latency, minimum quality scores).
The planning model: for each task type, multiply the expected volume by the per-task resource requirement. Sum across all task types to get total resource demand. Compare against available capacity — API rate limits, compute nodes, memory. If demand exceeds capacity, the choices are: add capacity (more infrastructure, higher API tier), reduce demand (deprioritize low-value tasks, increase caching), or relax SLOs (accept higher latency or lower quality on non-critical tasks). Capacity planning is not a one-time exercise. Review monthly as task volumes change, new task types are added, and model pricing evolves.
Do This
- Plan capacity based on projected peak demand, not average demand — peaks cause failures, averages do not
- Include headroom: provision at 70% of maximum capacity — the remaining 30% absorbs spikes
- Review capacity monthly — task volumes change, new tasks are added, and pricing evolves
Avoid This
- Plan for average load and hope peaks do not happen — they will, and the system will degrade
- Provision at 100% utilization — zero headroom means every spike causes latency degradation or dropped tasks
- Set capacity once and forget it — the load profile six months from now will not match today's