OC-301f · Module 1

Agent Test Taxonomy

3 min read

Agent testing has five test types, each catching a different failure class. Unit tests: does the module's logic layer produce correct results given controlled inputs? These are deterministic and use mocked data — no LLM calls. Integration tests: does the module integrate correctly with the framework lifecycle, event system, and other modules? These test plumbing, not intelligence. Behavioral tests: does the agent's output meet quality criteria — format, content, tone, accuracy? These are non-deterministic and use behavioral assertions. Regression tests: do previously fixed bugs stay fixed? A snapshot of the bug-triggering input and the expected behavior, re-run after every change. Chaos tests: does the system handle failures gracefully — network outages, API timeouts, corrupted memory, conflicting instructions?

  1. 1. Unit Tests (Deterministic) Test the logic layer with mocked inputs and outputs. No LLM calls. These run fast, are fully deterministic, and catch computation errors. Target: 90% coverage of the logic layer.
  2. 2. Integration Tests (Framework) Test lifecycle hooks, event handling, and module-to-module communication. Verify that the module initializes, responds to events, and shuts down cleanly. These test system behavior, not output quality.
  3. 3. Behavioral Tests (Quality) Test output quality with behavioral assertions: content completeness, format compliance, persona consistency, factual accuracy. Run each behavioral test 3-5 times to account for non-determinism. All runs must pass.