RC-401b · Module 2
Monitoring & Observability
4 min read
A production agent system you cannot observe is a production agent system you cannot trust. Monitoring is not a dashboard you check once a day. It is the continuous feedback loop that tells you whether your agents are operating within specification or drifting toward failure.
Full-stack observability for agent operations spans three layers that map directly to the three domains this capstone integrates. Layer one is AT monitoring: team-level metrics like orchestration completion rates, per-agent success rates, inter-agent handoff latency, and token spend per session. Layer two is AS security monitoring: SIEM integration that correlates agent actions with security events, detects anomalous behavior patterns, and flags prompt injection attempts in real time. Layer three is operational telemetry: system-level metrics from your CC and OC infrastructure — CPU, memory, API latency, error rates, queue depth.
- Instrument AT Monitoring Track four metrics for every orchestration: completion rate (did all agents finish?), quality score (did the critic approve on first pass?), token spend (total and per-agent), and wall-clock time. Log these per session. After 50 sessions, you have enough data to establish baselines. Any session that deviates more than two standard deviations from baseline triggers investigation.
- Integrate AS Security Monitoring Feed agent action logs into your SIEM. Define correlation rules for three threat patterns: privilege escalation (agent attempts to access resources outside its boundary), data exfiltration (agent outputs contain patterns matching sensitive data formats), and prompt injection (agent behavior changes abruptly mid-session, indicating injected instructions). Each pattern maps to a specific alert severity and response playbook.
- Establish Operational Baselines Run your agent system for two weeks in a monitored staging environment before production. Record every metric at 5-minute intervals. Compute means, standard deviations, and percentile distributions. These baselines become your anomaly detection thresholds. Without baselines, every alert is either false-positive noise or a missed critical — you have no way to distinguish normal variance from genuine drift.