AS-301e · Module 3

Exfiltration Detection Patterns

3 min read

Exfiltration through AI systems produces detectable signals — but the signals are different from traditional DLP alerts. Traditional DLP catches files leaving the network. AI exfiltration detection catches semantic anomalies in model outputs, unusual context window compositions, and tool usage patterns that deviate from baseline.

Output Volume Anomalies An agent that normally produces 200-word responses suddenly generating 2,000-word responses is a signal. The additional content may contain exfiltrated data embedded in verbose output. Track output length by agent and task type, and alert on statistical deviations.
Context Composition Anomalies Track what data enters the context window for each session. If an agent that normally processes three data fields suddenly retrieves twenty fields including sensitive ones not needed for the task, the retrieval pattern is anomalous. Context composition monitoring catches injection attacks that manipulate the RAG pipeline.
Outbound Communication Anomalies If the agent has any external communication capability — API calls, webhooks, email — monitor the destination, frequency, and payload size. A sudden new destination, an increase in call frequency, or an unusually large payload are exfiltration indicators.