OC-301i · Module 3

The Quality-Speed Tradeoff

3 min read

Every performance optimization has a quality risk. Reducing context improves speed but may remove information the model needs. Using a cheaper model reduces cost but may reduce output quality. Caching responses improves latency but may serve stale data. The quality-speed tradeoff is not a single decision — it is a continuous calibration that depends on the task's quality requirements.

The calibration framework: for each task type, define the minimum acceptable quality score per dimension (factual accuracy, format compliance, persona consistency). Then apply performance optimizations one at a time, measuring quality after each. When a quality dimension drops below minimum, the optimization has gone too far — back it off. This produces a task-specific optimization profile: this task type tolerates context pruning to 30 days (quality stable) but not to 7 days (accuracy drops). This task type tolerates the mid-tier model (quality within 3%) but not the cheap model (quality drops 15%). Each task type has its own optimal operating point.

1. Define Quality Minimums For each task type, set minimum acceptable quality scores per dimension. These are non-negotiable thresholds — no optimization may cross them.
2. Optimize Incrementally Apply one optimization at a time. Measure quality after each. If quality stays above minimum, keep the optimization. If it drops below, revert. Each optimization is independently evaluated.
3. Document the Operating Point For each task type, document: model tier, context window, output constraints, caching policy, and the resulting quality scores. This is the task's performance contract — the optimized configuration that meets quality minimums.