AT-301a · Module 1

Probability Compounding

4 min read

Here is the math that nobody talks about until they get the bill. If each sub-agent has a 95% success rate on its task — which is optimistic — running 10 in parallel gives you 0.95^10 = 59% chance that ALL succeed without errors. At 20 agents, it drops to 36%. This is not a scaling problem you can ignore. It is the fundamental constraint on agent team architecture.

The implication is architectural: you must design for partial failure. Every orchestration pattern needs to handle the case where Agent 7 out of 10 fails. Does the lead retry? Skip and continue? Escalate to human review? If your orchestration treats any single failure as a total failure, your system will fail more often than it succeeds at scale.

  1. Calculate Your Expected Success Rate Before spawning N agents, compute the probability that all N succeed. If per-agent reliability is 95% and you need 8 agents, your all-succeed probability is 0.95^8 = 66%. If that is not acceptable, reduce N or increase per-agent reliability through better prompts and constraints.
  2. Design for Partial Results The lead should aggregate whatever succeeded and flag what needs re-running. "6 of 8 agents completed successfully. Here are the 6 results synthesized. Agents 4 and 7 failed with the following errors." Partial results are almost always more valuable than a total failure.
  3. Implement Retry Logic Failed agents can be re-spawned with adjusted prompts. The lead should analyze why an agent failed — ambiguous prompt, missing context, tool error — and adjust before retrying. Blind retries waste tokens. Informed retries fix root causes.