DS-201c · Module 2
Model Validation
3 min read
A model that performs well on the data it was trained on is not a good model. It is a model that memorized the answers. The only validation that matters is performance on data the model has never seen.
I have rejected more models for overfitting than for any other reason. The pattern is always the same: 95% accuracy on training data, 68% on new data. The team is excited about 95%. I am looking at 68%. Because 68% is the number that matters in production.
MODEL VALIDATION FRAMEWORK
===========================
STEP 1: TRAIN/TEST SPLIT
Reserve 20-30% of data as a test set.
NEVER touch the test set during model development.
The test set is your final exam. You take it once.
STEP 2: CROSS-VALIDATION (during development)
Split training data into 5 folds.
Train on 4 folds, validate on 1. Rotate 5 times.
Average the 5 validation scores. This is your
development accuracy estimate.
STEP 3: TEMPORAL VALIDATION (for time series)
Train on months 1-12. Test on months 13-15.
NEVER shuffle time series data randomly.
The model must predict the future from the past,
not the past from the future.
STEP 4: FINAL TEST
Run the model on the held-out test set. Once.
This is your production accuracy estimate.
If it drops > 5% from cross-validation, you overfit.
Go back to Step 2 and simplify the model.
STEP 5: PRODUCTION MONITORING
Track prediction accuracy weekly in production.
If accuracy degrades > 5% from test performance,
the model needs retraining. Data drift is real.
METRICS TO REPORT:
Accuracy: Overall correctness (misleading if imbalanced)
Precision: Of predicted positives, how many are correct
Recall: Of actual positives, how many did we catch
AUC-ROC: Overall discrimination ability (best summary)
Calibration: Does 70% predicted probability = 70% actual?
Calibration is the metric most teams ignore and the one I care about most. A well-calibrated model means: when it says "70% probability of closing," roughly 70 out of 100 such deals actually close. A poorly calibrated model might say "70% probability" but only 45 out of 100 close. The predictions look confident. They are wrong.
CLOSER relies on calibrated predictions for pipeline forecasting. If my model says a deal has 85% probability and that number is calibrated, he can plan around it. If it is uncalibrated, the forecast is meaningless. Calibration is what makes prediction actionable.
Do This
- Validate on held-out data the model never saw during training — this is your real accuracy
- Use temporal validation for any time-based prediction — train on the past, test on the future
- Track calibration alongside accuracy — a model that says 70% should be right 70% of the time
Avoid This
- Report training accuracy as model performance — that measures memorization, not prediction
- Shuffle time series data randomly — the model will learn from the future to predict the past
- Deploy a model without production monitoring — all models degrade over time as data drifts