DR-301b · Module 2

Prompt Testing & Calibration

3 min read

A prompt is not production-ready until it has been tested against known-answer cases. Known-answer testing takes a research question where you already know the correct answer and evaluates whether the prompt produces output that matches. If you ask the prompt to assess five competitors and you know the correct ranking from your own domain expertise, the prompt's output should align with your assessment. Deviations reveal structural weaknesses in the prompt design.

  1. Known-Answer Testing Run the prompt against 3-5 scenarios where you know the correct answer. Score each output on accuracy, completeness, format compliance, and confidence calibration. If the prompt scores below 80% on any dimension across all test cases, it needs structural revision.
  2. Edge Case Testing Feed the prompt scenarios that should produce uncertainty: insufficient data, contradictory sources, ambiguous criteria. A well-calibrated prompt should express uncertainty in these cases, not fabricate confident answers. Edge case testing reveals whether your epistemic constraints are working.
  3. Version Regression When you modify a prompt, rerun all previous test cases to ensure the modification did not degrade performance on scenarios it previously handled correctly. Prompt development follows the same regression discipline as software development.