AS-301a · Module 1

Zero-Trust Principles for AI

4 min read

Good news, everyone! Traditional perimeter security assumes that everything inside the network is trusted. That assumption was already dangerous with human users. With autonomous AI agents — systems that operate around the clock, make decisions without human oversight, and can be manipulated through prompt injection — the perimeter model is not just outdated, it is catastrophically insufficient. Zero-trust flips the assumption: nothing is trusted by default, regardless of where the request originates.

Zero-trust for AI agents operates on three principles. Verify explicitly: every request an agent makes — to a database, an API, a file system, another agent — is authenticated and authorized at the point of access. No implicit trust based on network location or prior authentication. Use least privilege: agents receive the minimum permissions required for their current task, and those permissions expire when the task completes. Not minimum permissions for their role — minimum permissions for their current action. Assume breach: architect every system as if an agent has already been compromised. Segment networks, limit blast radius, encrypt everything, and log everything. The question is not whether an agent will be compromised. The question is how much damage a compromised agent can do.

Verify Explicitly Every agent request is authenticated with a cryptographic identity and authorized against a policy engine. No request is approved based on "it came from inside the network" or "this agent was trusted yesterday." Authentication happens on every single request, every single time.
Use Least Privilege Agents receive just-in-time, just-enough-access permissions scoped to the current task. An agent analyzing sales data gets read access to the sales database for the duration of the analysis. When the task completes, the access expires. Permanent standing access is the enemy of zero-trust.
Assume Breach Architect every system assuming one agent is already compromised. If Agent A is compromised, can it reach Agent B's data? Can it escalate its own permissions? Can it modify its own logs? If the answer to any of these is yes, the architecture has a blast radius problem.