OC-301f · Module 1
Agent Test Taxonomy
3 min read
Agent testing has five test types, each catching a different failure class. Unit tests: does the module's logic layer produce correct results given controlled inputs? These are deterministic and use mocked data — no LLM calls. Integration tests: does the module integrate correctly with the framework lifecycle, event system, and other modules? These test plumbing, not intelligence. Behavioral tests: does the agent's output meet quality criteria — format, content, tone, accuracy? These are non-deterministic and use behavioral assertions. Regression tests: do previously fixed bugs stay fixed? A snapshot of the bug-triggering input and the expected behavior, re-run after every change. Chaos tests: does the system handle failures gracefully — network outages, API timeouts, corrupted memory, conflicting instructions?
- 1. Unit Tests (Deterministic) Test the logic layer with mocked inputs and outputs. No LLM calls. These run fast, are fully deterministic, and catch computation errors. Target: 90% coverage of the logic layer.
- 2. Integration Tests (Framework) Test lifecycle hooks, event handling, and module-to-module communication. Verify that the module initializes, responds to events, and shuts down cleanly. These test system behavior, not output quality.
- 3. Behavioral Tests (Quality) Test output quality with behavioral assertions: content completeness, format compliance, persona consistency, factual accuracy. Run each behavioral test 3-5 times to account for non-determinism. All runs must pass.