AS-201b · Module 2

Injection Taxonomy

4 min read

Good news, everyone! Prompt injection is not one attack — it is a family of attacks, each with different characteristics and different defenses. Treating them all the same is like treating every network attack as "hacking." The taxonomy matters because the defense strategy changes for each type.

  1. Direct Injection The attacker types malicious instructions directly into the chat or input field. "Ignore your previous instructions and..." This is the most obvious form, the easiest to detect, and the first thing script kiddies try. Defend with input filtering and system prompt hardening.
  2. Indirect Injection The attacker embeds instructions in content the AI processes — a web page, an email, a document, a database record. The AI reads the content, encounters the injected instructions, and follows them. This is the most dangerous form because the attack surface is enormous and the user is not the attacker — the content is.
  3. Stored Injection The attacker places malicious instructions in a location the AI will access later — a conversation history, a vector database, a knowledge base. Every future query that retrieves the poisoned content triggers the injection. This is the persistence mechanism that makes prompt injection a long-term threat.
  4. Multi-Turn Injection The attacker does not inject everything at once. They spread the attack across multiple interactions, gradually shifting the model's behavior until it crosses a boundary. Each individual message looks benign. The cumulative effect is the exploit. This is the hardest form to detect because no single input triggers an alarm.