DS-301a · Module 2

Anomaly Detection Systems

3 min read

A metric that deviates from its expected range is a signal. The question is whether anyone notices, and whether they notice in time to act. Anomaly detection automates the noticing. Instead of a human scanning forty dashboards every morning looking for something unusual, the system monitors every metric continuously and surfaces only the deviations that exceed a defined threshold. The human reviews five alerts instead of forty dashboards. The response time drops from "whenever someone happens to check" to "within minutes of the deviation."

Statistical anomaly detection works on a simple principle: establish what "normal" looks like, then flag what deviates. The challenge is defining normal. A naive threshold — "alert when revenue drops below $X" — generates false positives on weekends, holidays, and seasonal dips. A better approach uses dynamic baselines: the expected value adjusts for day of week, seasonality, trend, and known events. The deviation is measured against the adjusted baseline, not a fixed number. AI-enhanced detection goes further, learning the metric's behavior over time and adapting the baseline as the underlying pattern evolves. The result is fewer false positives and faster detection of real anomalies.

The classification of anomalies matters as much as the detection. Not every deviation is a problem. Revenue spiking 30% is an anomaly — it is also good news. Support tickets dropping 50% might mean your product improved, or it might mean the ticket system is down. Context determines interpretation. The anomaly detection system flags the deviation. The human determines the meaning. The best systems provide context alongside the alert: what changed, when it changed, and what correlated changes occurred in other metrics at the same time.