PM-201c · Module 3

Monitoring Production Prompts

3 min read

A prompt that performs well in testing can degrade in production without warning. Input distributions shift. User behavior changes. The model is updated. A downstream system changes the format it expects. Production monitoring is the system that tells you when any of these changes have caused output quality to drift — before the users report it or the business consequence becomes visible.

Metric 1: Format compliance rate What percentage of outputs comply with the specified format — correct structure, required fields present, length within range? This is automatable. Run a format validator against every output. Track the pass rate over time. A declining format compliance rate indicates the prompt is being affected by input drift or a model change.
Metric 2: Error rate What percentage of outputs are flagged as errors — hallucinations, missing required content, constraint violations? Baseline this at launch and track it. A rising error rate is a signal to trigger a prompt review before users feel the impact.
Metric 3: Output length drift Track the distribution of output lengths over time. A significant shift in average length — in either direction — indicates a change in prompt behavior. Length drift often precedes quality drift.
Metric 4: Latency Track response time. Significant latency increases can indicate model changes, infrastructure changes, or prompt complexity increases that were not intentional.