CDX-301i · Module 2

Performance Dashboards & Anomaly Detection

3 min read

Performance dashboards for agent systems track four golden signals: throughput (tasks completed per hour), latency (time from submission to completion), error rate (percentage of tasks that fail after all retries), and cost efficiency (cost per successfully completed task). These signals apply at every level — per agent, per pipeline, per team, and system-wide. A dashboard that only shows system-wide metrics hides problems; a dashboard that only shows agent-level metrics drowns in noise. Layer the views.

Anomaly detection automates the pattern-recognition that a human operator would do by watching dashboards. Statistical baselines are established from historical data: a code review agent typically takes 30-60 seconds and costs $0.02-0.05. When an observation falls outside the expected range — a review taking 300 seconds or costing $0.50 — an anomaly alert fires. The alert should include the trace ID, the deviation magnitude, and possible causes (e.g., "the input PR was 10x larger than average").

from dataclasses import dataclass
import statistics

@dataclass
class Metric:
    name: str
    values: list[float]  # Historical observations

    @property
    def mean(self) -> float:
        return statistics.mean(self.values)

    @property
    def stddev(self) -> float:
        return statistics.stdev(self.values) if len(self.values) > 1 else 0

    def is_anomaly(self, value: float, threshold: float = 2.0) -> bool:
        """Flag if value exceeds mean ± threshold × stddev."""
        if self.stddev == 0:
            return False
        z_score = abs(value - self.mean) / self.stddev
        return z_score > threshold

# Golden signals dashboard
dashboard = {
    "throughput": Metric("tasks/hr", [45, 52, 48, 50, 47]),
    "latency_p95": Metric("seconds", [62, 58, 65, 60, 63]),
    "error_rate": Metric("percent", [3.2, 2.8, 3.5, 3.0, 2.9]),
    "cost_per_task": Metric("usd", [0.04, 0.03, 0.05, 0.04, 0.03]),
}

# Check latest observation
latest_latency = 180  # seconds
if dashboard["latency_p95"].is_anomaly(latest_latency):
    print(f"⚠ Latency anomaly: {latest_latency}s "
          f"(baseline: {dashboard['latency_p95'].mean:.0f}s)")

Do This

Track the four golden signals: throughput, latency, error rate, cost efficiency
Layer dashboards: system-wide for executives, per-team for managers, per-agent for operators
Set anomaly thresholds based on historical baselines, not arbitrary values
Alert on trends (2-week moving average) in addition to point anomalies

Avoid This

Build dashboards without alerting — dashboards nobody watches are decoration
Set static thresholds ("alert if latency > 60s") without baseline data — you will get false positives
Ignore cost efficiency metrics — a fast, reliable system that costs 10x too much is not healthy
Alert on every anomaly with the same severity — prioritize by business impact, not statistical deviation