OC-301i · Module 3
The Quality-Speed Tradeoff
3 min read
Every performance optimization has a quality risk. Reducing context improves speed but may remove information the model needs. Using a cheaper model reduces cost but may reduce output quality. Caching responses improves latency but may serve stale data. The quality-speed tradeoff is not a single decision — it is a continuous calibration that depends on the task's quality requirements.
The calibration framework: for each task type, define the minimum acceptable quality score per dimension (factual accuracy, format compliance, persona consistency). Then apply performance optimizations one at a time, measuring quality after each. When a quality dimension drops below minimum, the optimization has gone too far — back it off. This produces a task-specific optimization profile: this task type tolerates context pruning to 30 days (quality stable) but not to 7 days (accuracy drops). This task type tolerates the mid-tier model (quality within 3%) but not the cheap model (quality drops 15%). Each task type has its own optimal operating point.
- 1. Define Quality Minimums For each task type, set minimum acceptable quality scores per dimension. These are non-negotiable thresholds — no optimization may cross them.
- 2. Optimize Incrementally Apply one optimization at a time. Measure quality after each. If quality stays above minimum, keep the optimization. If it drops below, revert. Each optimization is independently evaluated.
- 3. Document the Operating Point For each task type, document: model tier, context window, output constraints, caching policy, and the resulting quality scores. This is the task's performance contract — the optimized configuration that meets quality minimums.