OC-301f · Module 3

Test Observability & Reporting

3 min read

A test that fails and nobody notices is the same as a test that does not exist. Test observability ensures that test results are visible, trending, and acted on. The observability stack for agent testing has three components: dashboards, alerts, and trends.

Dashboards show the current state: last test run results by suite, quality scores by dimension, regression alerts by change, and test coverage by module. Alerts fire when: a test suite fails, a quality dimension drops below threshold, a regression is detected, or the nightly behavioral suite has not run (indicating infrastructure failure). Trends show the trajectory: quality scores over time by dimension, test suite execution time trends, failure rate by module, and regression frequency by change type. Trends are where you see whether the testing infrastructure is improving the system or just documenting its decline.

Do This

Build dashboards that show quality scores over time — trends reveal whether the system is improving or degrading
Alert on quality threshold breaches and missing test runs — both indicate problems that need attention
Track test execution time as a metric — tests that get slower over time become tests that get skipped

Avoid This

Run tests without reviewing results — green dashboards that nobody looks at are decoration
Track only pass/fail without quality scores — a test can pass while quality gradually degrades
Ignore flaky tests — a test that sometimes fails for no apparent reason is hiding an intermittent bug