OC-301f · Module 2

Quality Regression Detection

3 min read

Quality regression is when an agent's output quality declines after a change — a prompt update, a model upgrade, a memory system modification, or a plugin installation. The regression may be subtle: the agent still produces valid output, but the output is slightly less thorough, slightly less well-structured, or slightly less aligned with the persona. These subtle regressions compound until the output is noticeably degraded, at which point the root cause is buried under weeks of accumulated changes.

Regression detection requires a quality baseline: a set of reference inputs with scored reference outputs. After every system change, run the baseline inputs through the modified system and compare the quality scores. If any dimension drops by more than 10%, the change caused a regression that needs investigation before the change ships to production. The baseline should be updated quarterly to reflect evolving quality standards, but the comparison protocol remains constant.

Do This

Maintain a quality baseline: 20-30 reference inputs with scored reference outputs, updated quarterly
Run the baseline after every system change — prompt updates, model upgrades, plugin installations
Flag any quality dimension that drops by 10%+ as a regression requiring investigation

Avoid This

Assume system changes are safe because they do not cause errors — quality degradation is not an error
Run quality checks only after major changes — minor changes accumulate into major regressions
Update the baseline to match the new output — that normalizes the regression instead of catching it