OC-301f · Module 1

Testing Non-Deterministic Systems

4 min read

Traditional software testing asserts exact output: given input X, expect output Y. Agent testing cannot do this because the same input can produce multiple valid outputs. Ask an agent to summarize a document and run it twice — you get two different summaries, both correct. Traditional assertion fails on the first run even when the output is valid. Agent testing requires a different paradigm: behavioral assertions that verify properties of the output rather than its exact content.

Behavioral assertions check: Does the output contain the required information? Does it conform to the specified format? Is the tone consistent with the agent's persona? Are the factual claims supported by the input data? Does the output length fall within the specified range? Each assertion is a property of the output, not a comparison to a reference output. This approach accommodates non-determinism while still catching genuine failures — an output that is missing required information, uses the wrong format, or violates the persona is a failure regardless of the specific words used.

// Traditional test — fails on valid non-deterministic output
assert.equal(output, "Expected exact summary text");

// Behavioral assertions — pass on any valid output
assert.includes(output, requiredKeyTerms);    // contains key info
assert.matches(output.format, expectedSchema); // correct structure
assert.between(output.wordCount, 100, 200);    // length in range
assert.score(output.tone, persona, 0.8);       // persona alignment
assert.every(output.claims, claim =>
  sourceData.supports(claim)                   // factual accuracy
);