CW-301h · Module 1
Prompt Testing & Validation
3 min read
A prompt that works once is an anecdote. A prompt that works consistently across different inputs, different users, and different context documents is a library-grade prompt. Testing validates consistency. Without it, the library is a collection of prompts that might work.
The testing protocol: run each prompt with three different inputs that represent the range of expected usage. Input one: the standard case — clean data, well-defined question. Input two: the edge case — incomplete data, ambiguous question. Input three: the adversarial case — contradictory data, out-of-scope question. A library-grade prompt produces good output on input one, graceful degradation on input two, and a clear failure message on input three. If it produces confident but wrong output on input three, the prompt needs guardrails.
Do This
- Test every library prompt with at least three inputs: standard, edge case, and adversarial
- Have someone other than the author test the prompt — they will discover assumptions the author cannot see
- Document the test results alongside the prompt — readers should know what was tested
Avoid This
- Add a prompt to the library after one successful use — that is testing the happy path only
- Test with the same data you used to develop the prompt — it will pass because it was tuned to that data
- Skip adversarial testing because "users would not do that" — they will, and the library needs to handle it