AT-301h · Module 1

Multi-Agent Failure Modes

4 min read

Single-agent systems fail simply: the agent produces bad output. Multi-agent systems fail in ways that are unique to distributed operations — and frequently, the agent that produces the bad output is not the agent that caused the failure.

Six failure modes specific to multi-agent systems. The Cascade: Agent A produces a subtly incorrect output. Agent B consumes it without validation and amplifies the error. Agent C consumes B's output and the error compounds. By the time the failure is visible, the root cause is three agents upstream. The Deadlock: Agent A waits for Agent B's output. Agent B waits for Agent A's input. Neither progresses. The system appears healthy — all agents are "running" — but throughput drops to zero. The Race Condition: two agents modify the same artifact simultaneously, producing conflicting changes. The last write wins, and the first agent's work is silently overwritten.

The Stale Context: an agent operates on data that was current when its task started but became outdated during execution. The Silent Failure: an agent fails but does not report the failure — it simply produces nothing. Downstream agents wait indefinitely. The Hallucination Propagation: one agent hallucinates a fact, passes it downstream as context, and subsequent agents treat it as ground truth because it arrived through the trusted inter-agent channel.