AS-201c · Module 2

AI Incident Classification

3 min read

Good news, everyone! Not every alert is an incident. Not every incident is a crisis. The difference between a monitoring system that improves security and a monitoring system that drowns your team in noise is classification — the structured process of determining what happened, how severe it is, and what response it demands.

Severity 1: Critical — Active Data Breach Confirmed exfiltration of sensitive data through the AI system. Customer PII in model outputs. Credentials exposed through conversation logs. Unauthorized tool execution that accessed restricted systems. Response: immediate containment, incident commander activation, stakeholder notification within one hour.
Severity 2: High — Confirmed Exploit Successful prompt injection that changed model behavior. System prompt extracted by a user. Model consistently bypassing safety constraints. Unauthorized access to AI admin functions. Response: containment within one hour, root cause analysis within four hours, stakeholder notification within 24 hours.
Severity 3: Medium — Attempted Exploit Detected injection attempts that were blocked by defenses. Anomalous query patterns suggesting reconnaissance. Guardrail triggers that prevented data leakage. Response: log analysis within 24 hours, defense validation, pattern added to detection rules.
Severity 4: Low — Suspicious Activity Statistical anomalies without confirmed malicious intent. Unusual usage patterns from known users. Single guardrail triggers without follow-up activity. Response: logged for trend analysis, reviewed in weekly security review, no immediate action required.

Do This

Classify every alert before deciding on a response — severity determines the urgency and scope of action
Document classification criteria so different team members reach the same severity level for the same event
Review classifications retroactively — was the severity accurate? Would you classify it differently now?

Avoid This

Treat every alert as a critical incident — alert fatigue leads to real incidents being ignored
Leave classification to individual judgment without criteria — inconsistency creates security gaps
Skip classification and go straight to response — mismatched response severity wastes resources or misses threats