DS-301f · Module 2

Anomaly Detection for Data Quality

3 min read

Validation rules catch known error patterns. Anomaly detection catches unknown ones. The system monitors data distributions over time and flags deviations. If the average deal value has been $85K for six months and suddenly shifts to $120K, the anomaly detection flags it — not as an error necessarily, but as a pattern change that warrants investigation. It could be a data quality issue (someone entered values in thousands instead of dollars). It could be a real business change (the team is closing larger deals). The detection surfaces the anomaly. The investigation determines the cause. Without detection, the anomaly goes unnoticed until someone happens to look at the data — which might be next quarter.

Do This

  • Monitor key data distributions continuously and alert on deviations beyond two standard deviations
  • Investigate every anomaly — the cause is either a data quality issue or a business change, both worth knowing
  • Calibrate anomaly thresholds using historical data to minimize false positives

Avoid This

  • Rely solely on validation rules — they catch known patterns, not novel errors
  • Ignore anomalies because "the data passed validation" — anomalies are signals that validation rules missed
  • Set anomaly thresholds so tight that every daily fluctuation triggers an alert — threshold calibration prevents fatigue