PM-301i · Module 2

Alerting Thresholds

4 min read

Alert thresholds exist in a tension: too sensitive and they fire constantly on noise, training the team to ignore them. Not sensitive enough and they miss real degradation until it is severe. Setting meaningful thresholds requires understanding what normal looks like for each metric — the baseline — and what deviation from normal is significant enough to warrant investigation.

Baseline establishment: collect 30 days of production data before setting any alerts. Calculate the mean and standard deviation for each metric over that window. The alert threshold is a function of those statistics, not a fixed number applied uniformly. A format compliance rate of 95% might be baseline for one prompt and 72% for another (a more complex prompt with a harder output requirement). Alerting when compliance drops 10 points is correct for the first prompt and a false positive factory for the second.

Alert tiers: use two levels. Warning: the metric has deviated in a way that warrants monitoring and investigation but may self-correct. Escalate to an on-call rotation who will investigate during business hours. Critical: the metric has deviated severely enough that user impact is likely or confirmed. Escalate immediately. Use warning to detect drift early; use critical for incidents.

# Prompt monitoring alert configuration
# All thresholds are relative to the 30-day rolling baseline
# Baselines recalculate nightly

prompts:
  sales-email-007:
    baseline_window_days: 30
    alerts:
      format_compliance_rate:
        warning:
          condition: "value < baseline * 0.95"   # 5% drop from baseline
          window: "1h rolling"
          message: "Format compliance declining. Investigate input distribution or model behavior."
        critical:
          condition: "value < baseline * 0.85"   # 15% drop from baseline
          window: "30m rolling"
          message: "Significant format compliance failure. User impact likely."
          escalation: "page-on-call"

      output_token_count_avg:
        warning:
          condition: "value > baseline * 1.2 OR value < baseline * 0.8"
          window: "2h rolling"
          message: "Output length has shifted. Check for verbosity change or truncation."
        critical:
          condition: "value > baseline * 1.5 OR value < baseline * 0.6"
          window: "1h rolling"
          escalation: "page-on-call"

      api_error_rate:
        warning:
          condition: "value > 0.02"              # absolute threshold, not relative
          window: "15m rolling"
        critical:
          condition: "value > 0.05"
          window: "10m rolling"
          escalation: "page-on-call"

      user_correction_rate:
        warning:
          condition: "value > baseline * 1.25"   # 25% increase in corrections
          window: "24h rolling"                  # slower signal, longer window
          message: "User corrections increasing. Quality drift likely."
        critical:
          condition: "value > baseline * 1.50"
          window: "24h rolling"