PM-301c · Module 3

Format Testing

4 min read

Format compliance must be tested systematically and separately from output quality. A response can be factually correct and semantically excellent while violating the format specification. These are independent failure dimensions that require independent test coverage.

Define the Format Test Suite For each format requirement, write a test case: is the JSON parseable, does it contain all required keys, are the types correct, are the enum values within the permitted set, are count constraints satisfied, are delimiter sections present. Automate these checks — do not rely on human review.
Run on Representative Sample Run format tests on at least 50 representative inputs. Format compliance on 3 test cases is not sufficient — the model can produce edge-case violations that only appear on specific input types. 50 inputs reveals patterns; 3 inputs reveals nothing.
Track Compliance Rate Your target is 99%+ format compliance in production. Below 95% is a prompt engineering problem. Measure compliance rate per field, not just per response — a response can be compliant on 9 of 10 fields and still require a fix.
Detect Drift Format compliance rates can drift when the model is updated, when the input distribution shifts, or when the prompt is edited. Run format compliance checks continuously in production. Alert on compliance drops of more than 2 percentage points from baseline.

Format Test Suite: ContractReview JSON output

Required checks (all must pass for compliance):
□ Response is valid JSON (JSON.parse succeeds)
□ "summary" key present, value is string
□ "summary" value length <= 200 characters
□ "risk_level" key present, value is one of: "low", "medium", "high", "critical"
□ "recommendations" key present, value is array
□ "recommendations" array length between 1 and 5
□ All "recommendations" values are strings
□ No keys present outside the defined schema
□ No prose before the opening brace
□ No prose after the closing brace

Test inputs must include:
- Straightforward contract clause (baseline)
- Ambiguous contract clause (tests handling of uncertainty)
- Contract clause with no clear recommendations (edge case)
- Very short clause (format regression test)
- Clause with multiple risk levels (tests risk_level disambiguation)