AT-301h · Module 3
Post-Mortem Methodology
4 min read
Every S1 and S2 incident gets a post-mortem within 48 hours. The post-mortem is not a blame exercise — it is a system improvement exercise. The output is not "who failed" but "what structural change prevents this category of failure from recurring."
The post-mortem template has five sections. Timeline: a minute-by-minute reconstruction from the first anomaly signal to full resolution. Root Cause Analysis: the Five Whys chain that connects the symptom to the structural root cause. Impact Assessment: what downstream work was affected, what SLAs were breached, what customer commitments were at risk. Contributing Factors: conditions that did not cause the failure but made it worse (alert fatigue, missing runbook, stale baseline). Action Items: specific, assigned, time-bound changes to prevent recurrence.
The action items are the only part that matters operationally. A post-mortem without action items is just storytelling. Each action item has an owner, a deadline, and a verification criterion — how will we know the fix worked? Open action items are tracked in the weekly coordination review until closed.