AS-201b · Module 1
Mapping Trust Boundaries
3 min read
This is the part most people skip. This is the part that matters.
A trust boundary is the line between trusted and untrusted. In traditional systems, the boundary is clear: inside the firewall is trusted, outside is untrusted. User input is untrusted. Server code is trusted. But in AI systems, the trust boundaries are blurred — and the blurring is where the vulnerabilities live.
Consider an AI agent that reads emails and drafts responses. The system prompt is trusted — you wrote it. The email content is untrusted — an attacker could send a crafted email. But both the system prompt and the email content arrive in the same context window. The model processes them identically. There is no trust boundary between your instructions and the attacker's injected instructions at the model level. This is the fundamental trust boundary problem in AI systems: the model flattens your security hierarchy into a single input stream.
Do This
- Draw explicit trust boundaries for every component in your AI system architecture
- Treat all user-supplied content as untrusted input — including content the user did not write directly (emails, web pages, uploaded files)
- Validate and sanitize at every trust boundary crossing, not just at the user interface
- Assume the model cannot enforce trust boundaries — build enforcement into the application layer
Avoid This
- Assume the system prompt is "more trusted" than user input from the model's perspective — the model sees both as text
- Trust the model to follow instructions like "never reveal this information" — these are guidelines, not enforcement
- Skip trust boundary analysis because "we trust our users" — the threat is not your users, it is the content they forward
- Place trust boundaries only at the perimeter — AI systems need internal trust boundaries at every integration point