CDX-301i · Module 2
Performance Dashboards & Anomaly Detection
3 min read
Performance dashboards for agent systems track four golden signals: throughput (tasks completed per hour), latency (time from submission to completion), error rate (percentage of tasks that fail after all retries), and cost efficiency (cost per successfully completed task). These signals apply at every level — per agent, per pipeline, per team, and system-wide. A dashboard that only shows system-wide metrics hides problems; a dashboard that only shows agent-level metrics drowns in noise. Layer the views.
Anomaly detection automates the pattern-recognition that a human operator would do by watching dashboards. Statistical baselines are established from historical data: a code review agent typically takes 30-60 seconds and costs $0.02-0.05. When an observation falls outside the expected range — a review taking 300 seconds or costing $0.50 — an anomaly alert fires. The alert should include the trace ID, the deviation magnitude, and possible causes (e.g., "the input PR was 10x larger than average").
from dataclasses import dataclass
import statistics
@dataclass
class Metric:
name: str
values: list[float] # Historical observations
@property
def mean(self) -> float:
return statistics.mean(self.values)
@property
def stddev(self) -> float:
return statistics.stdev(self.values) if len(self.values) > 1 else 0
def is_anomaly(self, value: float, threshold: float = 2.0) -> bool:
"""Flag if value exceeds mean ± threshold × stddev."""
if self.stddev == 0:
return False
z_score = abs(value - self.mean) / self.stddev
return z_score > threshold
# Golden signals dashboard
dashboard = {
"throughput": Metric("tasks/hr", [45, 52, 48, 50, 47]),
"latency_p95": Metric("seconds", [62, 58, 65, 60, 63]),
"error_rate": Metric("percent", [3.2, 2.8, 3.5, 3.0, 2.9]),
"cost_per_task": Metric("usd", [0.04, 0.03, 0.05, 0.04, 0.03]),
}
# Check latest observation
latest_latency = 180 # seconds
if dashboard["latency_p95"].is_anomaly(latest_latency):
print(f"⚠ Latency anomaly: {latest_latency}s "
f"(baseline: {dashboard['latency_p95'].mean:.0f}s)")
Do This
- Track the four golden signals: throughput, latency, error rate, cost efficiency
- Layer dashboards: system-wide for executives, per-team for managers, per-agent for operators
- Set anomaly thresholds based on historical baselines, not arbitrary values
- Alert on trends (2-week moving average) in addition to point anomalies
Avoid This
- Build dashboards without alerting — dashboards nobody watches are decoration
- Set static thresholds ("alert if latency > 60s") without baseline data — you will get false positives
- Ignore cost efficiency metrics — a fast, reliable system that costs 10x too much is not healthy
- Alert on every anomaly with the same severity — prioritize by business impact, not statistical deviation