OC-301i · Module 1
Performance Profiling for Agent Systems
4 min read
Performance profiling for agent systems differs from traditional profiling because the bottleneck is rarely computation. Agent systems spend most of their time waiting — waiting for LLM API responses, waiting for external data sources, waiting for council deliberation, waiting for human approval. The profile of an agent task is dominated by I/O wait, not CPU time. Optimizing computation when the bottleneck is I/O produces no measurable improvement.
The profiling approach: instrument every agent task with timing spans that capture: prompt construction time (how long to assemble the context), API call time (how long the LLM takes to respond), post-processing time (how long to parse and format the response), inter-agent communication time (how long to route messages between agents), and wait time (how long the task sat in a queue before an agent picked it up). The profile reveals where time is actually spent. In most systems, 70-80% of elapsed time is API call time and queue wait time. Optimizing the other 20-30% has diminishing returns.
- 1. Instrument Every Phase Add timing spans to: prompt construction, API calls, response parsing, inter-agent messaging, and queue wait. Each span is a row in the profile. The total should match the end-to-end elapsed time.
- 2. Identify the Bottleneck The phase that consumes the most time is the bottleneck. Optimizing any other phase will not improve end-to-end performance. If API calls are 75% of elapsed time, prompt construction optimization (5% of time) is irrelevant.
- 3. Measure Before and After Before any optimization, record the current profile: p50, p95, and p99 for each phase. After the optimization, measure again. If the bottleneck phase did not improve, the optimization failed regardless of what else changed.