OC-301h · Module 1
Severity & Escalation Framework
3 min read
Severity determines response speed, team composition, and communication cadence. Misclassifying severity — treating a SEV-1 as a SEV-3 or a SEV-3 as a SEV-1 — wastes resources and erodes trust. The severity framework must be explicit enough that the on-call engineer can classify an incident in under 2 minutes without consulting a manager.
SEV-1 (Critical): External stakeholder received incorrect output, autonomous actions were taken based on wrong data, or a safety boundary was violated. Response: immediate containment, incident commander assigned, stakeholder notification within 1 hour, status updates every 30 minutes. SEV-2 (Major): Internal operations affected, output quality degraded across multiple agents, or wrong decisions made but not yet acted upon. Response: containment within 30 minutes, root cause investigation begins immediately, status updates every 2 hours. SEV-3 (Minor): Anomaly detected, quality score below threshold, or single agent affected with no external impact. Response: investigation within 4 hours, resolution within 24 hours. SEV-4 (Watch): Potential issue detected by monitoring, not yet confirmed. Response: investigate during business hours, document findings.
interface SeverityLevel {
level: 1 | 2 | 3 | 4;
name: string;
criteria: string;
responseTime: string;
team: string;
updateCadence: string;
}
const severityMatrix: SeverityLevel[] = [
{
level: 1,
name: 'Critical',
criteria: 'External impact OR irreversible actions OR safety violation',
responseTime: 'Immediate',
team: 'Incident commander + on-call + stakeholder comms',
updateCadence: 'Every 30 minutes',
},
{
level: 2,
name: 'Major',
criteria: 'Internal impact OR multi-agent quality degradation',
responseTime: '30 minutes',
team: 'On-call + relevant module owner',
updateCadence: 'Every 2 hours',
},
{
level: 3,
name: 'Minor',
criteria: 'Single agent affected, no external impact',
responseTime: '4 hours',
team: 'On-call',
updateCadence: 'Daily',
},
];