AT-301e · Module 3

Drop Detection & Recovery

4 min read

A dropped handoff is a message that was sent but never acknowledged — the context package entered the void. In a 20-agent system processing approximately 200 handoffs per day, even a 0.3% drop rate means a handoff goes missing roughly every other day. That is not acceptable.

Drop detection uses the TTL field from the message envelope. Every handoff message has a TTL. If no acknowledgment is received within the TTL window, the orchestrator triggers the recovery protocol. Step 1: retry delivery to the original recipient. Step 2: if retry fails, check agent health — is the recipient operational? Step 3: if the agent is healthy but unresponsive, escalate to CLAWMANDER for rerouting to an alternate handler. Step 4: if no alternate exists, escalate to Tier 4 with the full context package preserved.

The key design principle: the context package is never lost. Even if delivery fails three times, the package persists in the orchestrator's buffer until a handler accepts it. Information loss is the one failure mode we do not tolerate.

  1. Detect the Drop TTL expires with no acknowledgment. The orchestrator flags the handoff as potentially dropped and initiates recovery.
  2. Retry Delivery Re-send the original context package to the intended recipient. If acknowledged within the retry TTL, the handoff completes normally and the incident is logged as a transient delivery failure.
  3. Reroute or Escalate If retry fails, the orchestrator identifies an alternate handler based on the task type and role compatibility. If no alternate exists, escalate to Tier 4. In all cases, the context package persists — nothing is lost, only delayed.