PM-301e · Module 2
Error Recovery in Agentic Prompts
5 min read
Agentic systems fail in ways that single-turn prompts do not. A tool call can fail, a result can be incomplete, an action can succeed but produce the wrong output. Without explicit error recovery instructions, agents exhibit predictable failure modes: they retry the same failing action indefinitely, they proceed despite errors as if results were successful, or they stop with a generic error message.
- Define Error Response Behavior Specify in the system prompt what to do when a tool returns an error: "If a tool call returns an error, do not retry immediately. Assess the error type: (1) transient (timeout, rate limit) — wait and retry once, (2) parameter error (wrong input) — fix the parameters and retry, (3) permanent (resource not found, permission denied) — stop and explain the failure to the user."
- Set a Retry Limit Unlimited retries create infinite loops. Set a maximum: "If the same tool call fails 3 times with similar errors, stop retrying and report the failure." This is the stuck-agent prevention. Without it, an agent encountering a persistent error will retry until context window is exhausted.
- Distinguish Error Types Transient errors (timeouts, rate limits) benefit from retry. Parameter errors (wrong input format) require fixing the input before retry. Permanent errors (not found, permission denied) should not be retried. Each requires a different response. Specify all three in the error handling instruction.
- Graceful Degradation When recovery is not possible, the agent must degrade gracefully: state what it was able to complete, state what failed and why, and state what information the user needs to provide for a retry. "I completed steps 1-3 successfully. Step 4 failed because [reason]. To continue, I need [information]." is more useful than a generic error.
# Error recovery instructions in system prompt
Error handling protocol:
1. On tool error: classify before responding
- Timeout / rate limit → retry once after 2 seconds. If retry fails, report and stop.
- Invalid parameters → fix the parameters, retry once. If retry fails, report the parameter issue.
- Not found → do not retry. Report: "Resource [X] was not found. Verify [Y] and try again."
- Permission denied → do not retry. Report: "Access to [X] is restricted. Contact [Y]."
- Unknown error → do not retry. Report the full error message.
2. Maximum retries: 2 total (1 initial + 1 retry). Never attempt a third call on the same step.
3. On persistent failure: stop the agentic loop. Report:
"I completed: [list of successful steps]
I was unable to complete: [failed step] — Reason: [error description]
To continue, the following is needed: [specific requirement]"
4. Do not continue to dependent steps after a failure. A failed prerequisite
invalidates all dependent steps.