OC-301g · Module 1

Metrics That Matter

3 min read

Agent systems generate thousands of potential metrics. Monitoring all of them is noise. Monitoring the wrong ones is worse — you build confidence in a dashboard that does not detect the failures that actually matter. The metrics that matter for agent systems fall into three categories: health, quality, and business impact.

Health metrics: is the agent functioning? Task completion rate (should be >98%), error rate (should be <2%), response time p50 and p99, memory usage, and queue depth. Quality metrics: is the agent's output good? Automated quality score per dimension, persona consistency score, factual accuracy rate, and human correction rate. Business metrics: is the agent producing value? Deliverables produced, time saved, escalation resolution rate, and stakeholder satisfaction. Health metrics are the heartbeat. Quality metrics are the diagnostic. Business metrics are the justification. All three are necessary.

Do This

Track health, quality, and business metrics — each category catches a different failure class
Set thresholds on quality metrics, not just health metrics — an agent can be healthy and produce bad output
Review metric relevance quarterly — metrics that never trigger alerts and never inform decisions are clutter

Avoid This

Track only infrastructure metrics — CPU and memory do not tell you whether the agent is producing good output
Track everything and dashboard nothing — data without visibility is storage cost without value
Set the same thresholds for all agents — a customer-facing agent needs tighter quality thresholds than an internal one