AS-201c · Module 1

What to Monitor in AI Systems

4 min read

Good news, everyone! Monitoring AI systems is fundamentally different from monitoring traditional software — and most organizations are monitoring the wrong things. They watch CPU utilization, memory usage, response latency, and error rates. These metrics matter. But they tell you nothing about the AI-specific failure modes: prompt injection attempts, output anomalies, data leakage, behavioral drift, and unauthorized tool usage. Traditional observability gives you a dashboard of green lights while the model is quietly leaking customer data.

AI monitoring requires a second layer of observability that sits above the infrastructure layer. You need to observe not just whether the system is running, but whether it is behaving correctly. A web server that returns HTTP 200 with the wrong content is "up" to your infrastructure monitor and "compromised" to anyone looking at the output. An AI system that responds fluently and helpfully to every query — including the ones it should refuse — looks healthy to traditional monitoring and catastrophic to anyone who understands the security implications.

Input Monitoring Track the content, length, and patterns of inputs. Alert on known injection patterns, unusual input lengths, repeated probing patterns from the same user, and sudden changes in query distribution. The input stream is where attacks begin.
Output Monitoring Track the content, format, and patterns of outputs. Alert on outputs that contain sensitive data patterns (PII, credentials, internal URLs), outputs that deviate from expected format, and outputs that are unusually long or short. The output stream is where data leaks.
Tool Usage Monitoring If your AI agent has tool access, track every tool invocation: which tool, what parameters, what result, how often. Alert on tools being called outside their expected usage patterns — a customer support agent suddenly querying the billing database at 3 AM is an anomaly.
Behavioral Drift Monitoring Track how the model's outputs change over time. If the model starts refusing queries it previously handled, or handling queries it previously refused, something has changed — a model update, a context poisoning attack, or a degraded system prompt. Behavioral drift is slow and invisible without baseline comparison.