RC-401b · Module 2

Monitoring & Observability

4 min read

A production agent system you cannot observe is a production agent system you cannot trust. Monitoring is not a dashboard you check once a day. It is the continuous feedback loop that tells you whether your agents are operating within specification or drifting toward failure.

Full-stack observability for agent operations spans three layers that map directly to the three domains this capstone integrates. Layer one is AT monitoring: team-level metrics like orchestration completion rates, per-agent success rates, inter-agent handoff latency, and token spend per session. Layer two is AS security monitoring: SIEM integration that correlates agent actions with security events, detects anomalous behavior patterns, and flags prompt injection attempts in real time. Layer three is operational telemetry: system-level metrics from your CC and OC infrastructure — CPU, memory, API latency, error rates, queue depth.

  1. Instrument AT Monitoring Track four metrics for every orchestration: completion rate (did all agents finish?), quality score (did the critic approve on first pass?), token spend (total and per-agent), and wall-clock time. Log these per session. After 50 sessions, you have enough data to establish baselines. Any session that deviates more than two standard deviations from baseline triggers investigation.
  2. Integrate AS Security Monitoring Feed agent action logs into your SIEM. Define correlation rules for three threat patterns: privilege escalation (agent attempts to access resources outside its boundary), data exfiltration (agent outputs contain patterns matching sensitive data formats), and prompt injection (agent behavior changes abruptly mid-session, indicating injected instructions). Each pattern maps to a specific alert severity and response playbook.
  3. Establish Operational Baselines Run your agent system for two weeks in a monitored staging environment before production. Record every metric at 5-minute intervals. Compute means, standard deviations, and percentile distributions. These baselines become your anomaly detection thresholds. Without baselines, every alert is either false-positive noise or a missed critical — you have no way to distinguish normal variance from genuine drift.