LR-301b · Module 3

Reviewer Calibration

3 min read

A scoring model is only as consistent as the people who use it. Two reviewers assessing the same provision should produce scores within half a point of each other. If they do not, the model is not miscalibrated — the reviewers are. Calibration exercises align reviewers on how to interpret the scoring dimensions and how to apply the scale consistently. Without calibration, the model produces numbers that look precise but are actually noisy.

Quarterly Calibration Sessions Present a set of five to ten reference provisions to all reviewers. Each reviewer scores independently. Compare scores. Discuss divergences. The discussion is the calibration — it surfaces different interpretations of the dimensions and aligns the team on a common application. [RECOMMEND]: Rotate the reference provisions each quarter to prevent memorization.
Inter-Reviewer Agreement Tracking Track the variance between reviewers over time. Decreasing variance indicates improving calibration. Increasing variance indicates drift that needs a calibration session. The metric is simple: average absolute difference between reviewers on the same provision. Target: below 0.5 points on a 5-point scale.
New Reviewer Onboarding New reviewers score a standardized set of twenty provisions and compare their scores against the calibrated baseline. Divergences are discussed individually. The new reviewer is calibrated before their scores enter the production scoring system. Uncalibrated reviewers produce uncalibrated scores. [CLEARED]: Calibration onboarding takes two hours and prevents months of inconsistent scoring.