AS-201c · Module 1

Establishing Behavioral Baselines

3 min read

You cannot detect an anomaly without a baseline. An anomaly is, by definition, a deviation from normal — and if you do not know what normal looks like, everything looks normal. Behavioral baselining for AI systems is the process of observing your system under known-good conditions and recording the patterns that define its healthy state.

A behavioral baseline for an AI agent captures several dimensions. What is the typical input length? What is the distribution of query topics? How long are typical outputs? What is the refusal rate — how often does the model decline to answer? What tools does it use, and how often? What is the typical conversation length before the user achieves their goal? These metrics, measured over a representative period of normal operation, form the baseline against which you detect anomalies.

Collect Two Weeks of Normal Traffic Before setting alert thresholds, run your monitoring in observation mode for at least two weeks. This captures daily patterns (morning peaks, afternoon lulls), weekly patterns (Monday spikes, weekend drops), and the natural variance in user behavior. Two weeks is the minimum. A month is better.
Compute Statistical Baselines For each monitored dimension, compute the mean and standard deviation. Set your initial alert thresholds at two standard deviations from the mean — anything beyond that is potentially anomalous. Tune the thresholds over the following weeks based on false positive rates.
Segment by User Type Different user groups produce different patterns. Power users send longer queries, new users ask more basic questions, API integrations have perfectly regular patterns. Baseline each segment separately so a power user's normal behavior does not trigger alerts designed for the average user.
Update Baselines Monthly Baselines drift as your user base grows and evolves. Re-compute your baselines monthly to capture organic changes in usage patterns. A baseline that was set six months ago and never updated will generate false positives on legitimate changes and miss real anomalies that hide in the new normal.