AS-301g · Module 2

Detection Rule Categories

4 min read

AI-specific detection rules fall into three categories, each catching a different class of threat. Signature rules match known attack patterns — specific injection phrases, known malicious tool parameters, banned output content. Behavioral rules detect deviations from established baselines — unusual output length, unexpected tool usage, anomalous session duration. Correlation rules connect events across agents and systems — a guardrail trigger on Agent A followed by an unusual database query on Agent B within the same time window.

Signature Rules Pattern-match on known indicators: injection phrases in input logs, sensitive data patterns in output logs, banned destinations in tool invocation logs. Signature rules are fast to write, fast to execute, and effective against known attack patterns. They miss novel attacks by definition — you can only detect what you have already seen.
Behavioral Rules Compare current activity against statistical baselines. Alert when an agent's output length exceeds two standard deviations, when tool invocation frequency spikes, when a session duration is abnormally long, or when the guardrail trigger rate for a user exceeds the threshold. Behavioral rules catch attacks that do not match any known signature.
Correlation Rules Connect events across agents, systems, and time windows. A guardrail trigger on one agent, followed by a credential rotation on a related service account, followed by an unusual egress pattern — individually these might not trigger alerts. Correlated together, they describe an attack sequence. Correlation rules are the most powerful and the most expensive to develop and tune.