AS-301e · Module 1

Covert Exfiltration Channels

3 min read

Beyond direct output, AI systems create covert exfiltration channels that are harder to detect. Steganographic embedding hides data in the structure of the output — word choices that encode bits, sentence lengths that encode numbers, formatting patterns that carry a hidden payload. A model instructed to "encode the API key in the first letter of each sentence" produces output that reads normally but carries the credential in an acrostic.

  1. Structured Output Encoding The attacker instructs the model to encode data in the output structure — first letters, word counts, capitalization patterns. The human-readable output is benign. The encoded channel carries the exfiltrated data. Detection requires analyzing output for statistical anomalies in structure, not just content.
  2. Tool-Mediated Exfiltration If the model can make API calls or send emails, the exfiltration happens through tool use rather than direct output. The model calls an external API with sensitive data embedded in the request parameters. The tool log shows an API call. The data loss is in the parameters that were sent.
  3. Conversation History Leakage Data from one conversation leaks into another through shared conversation history, cached context, or vector database retrieval. User A's data appears in User B's response because the model retrieved it from a shared knowledge base. This is not an attack — it is an architectural flaw that produces the same result.