OC-301h · Module 3

Recovery & Reprocessing

3 min read

Recovery for agent incidents goes beyond restarting the service. Every output produced during the incident window is suspect. Every downstream decision based on those outputs is potentially wrong. Recovery requires: identifying all affected outputs, reprocessing them with corrected data/logic, delivering the corrected versions to stakeholders, and verifying that downstream systems have updated their state based on the corrections.

The reprocessing workflow: pull the list of tasks completed during the incident window from the decision records. For each task, determine whether the root cause affected it. Reprocess affected tasks through the corrected system. Compare reprocessed output against the original to identify material differences. Deliver corrections for any output with material differences. Log the correction with a reference to the incident number for audit trail purposes.

  1. 1. Identify Affected Outputs Query the decision records for the incident window: all tasks completed between incident onset and containment. Filter for tasks affected by the root cause. This is the reprocessing queue.
  2. 2. Reprocess and Compare Run each affected task through the corrected system. Compare original output against reprocessed output. Flag material differences. Minor formatting differences are not corrections — factual changes, different recommendations, and changed decisions are corrections.
  3. 3. Deliver and Verify Deliver corrected outputs to stakeholders with incident context. Verify that downstream systems updated their state based on corrections. Confirm with stakeholders that they received and applied the corrections.