OC-301g · Module 1
Metrics That Matter
3 min read
Agent systems generate thousands of potential metrics. Monitoring all of them is noise. Monitoring the wrong ones is worse — you build confidence in a dashboard that does not detect the failures that actually matter. The metrics that matter for agent systems fall into three categories: health, quality, and business impact.
Health metrics: is the agent functioning? Task completion rate (should be >98%), error rate (should be <2%), response time p50 and p99, memory usage, and queue depth. Quality metrics: is the agent's output good? Automated quality score per dimension, persona consistency score, factual accuracy rate, and human correction rate. Business metrics: is the agent producing value? Deliverables produced, time saved, escalation resolution rate, and stakeholder satisfaction. Health metrics are the heartbeat. Quality metrics are the diagnostic. Business metrics are the justification. All three are necessary.
Do This
- Track health, quality, and business metrics — each category catches a different failure class
- Set thresholds on quality metrics, not just health metrics — an agent can be healthy and produce bad output
- Review metric relevance quarterly — metrics that never trigger alerts and never inform decisions are clutter
Avoid This
- Track only infrastructure metrics — CPU and memory do not tell you whether the agent is producing good output
- Track everything and dashboard nothing — data without visibility is storage cost without value
- Set the same thresholds for all agents — a customer-facing agent needs tighter quality thresholds than an internal one