CW-301h · Module 1

Prompt Testing & Validation

3 min read

A prompt that works once is an anecdote. A prompt that works consistently across different inputs, different users, and different context documents is a library-grade prompt. Testing validates consistency. Without it, the library is a collection of prompts that might work.

The testing protocol: run each prompt with three different inputs that represent the range of expected usage. Input one: the standard case — clean data, well-defined question. Input two: the edge case — incomplete data, ambiguous question. Input three: the adversarial case — contradictory data, out-of-scope question. A library-grade prompt produces good output on input one, graceful degradation on input two, and a clear failure message on input three. If it produces confident but wrong output on input three, the prompt needs guardrails.

Do This

  • Test every library prompt with at least three inputs: standard, edge case, and adversarial
  • Have someone other than the author test the prompt — they will discover assumptions the author cannot see
  • Document the test results alongside the prompt — readers should know what was tested

Avoid This

  • Add a prompt to the library after one successful use — that is testing the happy path only
  • Test with the same data you used to develop the prompt — it will pass because it was tuned to that data
  • Skip adversarial testing because "users would not do that" — they will, and the library needs to handle it