AS-201c · Module 1

Anomaly Detection Patterns

4 min read

Now, before we get to the interesting part — and this part is genuinely fascinating — let me distinguish between three categories of anomalies in AI systems. Point anomalies are single events that deviate from the baseline: one unusually long output, one query containing a known injection pattern. Contextual anomalies are events that are normal in isolation but anomalous in context: a database query at 3 AM from a customer support agent. Collective anomalies are patterns that emerge across multiple events: a user who sends ten normal queries followed by a progressive series of boundary-testing queries over twenty minutes.

  1. Rule-Based Detection The simplest and fastest detection layer. Define explicit rules: "alert if output contains a pattern matching an API key format," "alert if input exceeds 5000 tokens," "alert if the model uses a tool more than 50 times in one session." Rules catch the known-knowns — the attack patterns you have already identified.
  2. Statistical Detection Compute rolling statistics on your monitored dimensions and alert when values exceed the baseline thresholds. Average output length suddenly doubles? Alert. Refusal rate drops to zero? Alert. Tool usage spikes by three standard deviations? Alert. Statistical detection catches the known-unknowns — things you know could go wrong but cannot enumerate in advance.
  3. Sequence Detection Analyze patterns across multiple events in a session or from a single user. A sequence of increasingly aggressive prompts, a progression from innocent questions to boundary-testing queries, a series of tool calls that together constitute a privilege escalation path — none of these are detectable by individual event analysis. Sequence detection catches the sophisticated attacks.

Do This

  • Layer all three detection types: rules for known attacks, statistics for unknown anomalies, sequences for sophisticated threats
  • Start with rules — they are fast to implement and catch the highest-frequency attacks
  • Tune alert thresholds aggressively in the first month — false positives destroy trust in monitoring systems

Avoid This

  • Rely solely on rule-based detection — it only catches attacks you already know about
  • Set alert thresholds so sensitive that every shift change triggers a notification — alert fatigue kills response capability
  • Ignore collective anomalies because individual events look benign — the most dangerous attacks are gradual