CW-301h · Module 1

Prompt Testing & Validation

3 min read

A prompt that works once is an anecdote. A prompt that works consistently across different inputs, different users, and different context documents is a library-grade prompt. Testing validates consistency. Without it, the library is a collection of prompts that might work.

The testing protocol: run each prompt with three different inputs that represent the range of expected usage. Input one: the standard case — clean data, well-defined question. Input two: the edge case — incomplete data, ambiguous question. Input three: the adversarial case — contradictory data, out-of-scope question. A library-grade prompt produces good output on input one, graceful degradation on input two, and a clear failure message on input three. If it produces confident but wrong output on input three, the prompt needs guardrails.

Do This

Test every library prompt with at least three inputs: standard, edge case, and adversarial
Have someone other than the author test the prompt — they will discover assumptions the author cannot see
Document the test results alongside the prompt — readers should know what was tested

Avoid This

Add a prompt to the library after one successful use — that is testing the happy path only
Test with the same data you used to develop the prompt — it will pass because it was tuned to that data
Skip adversarial testing because "users would not do that" — they will, and the library needs to handle it