DS-301a · Module 1
Forecasting Fundamentals
4 min read
Every prediction problem falls into one of three archetypes. Time series: what will this metric be next month? Regression: what drives this outcome and by how much? Classification: which category does this observation belong to? That is it. Three archetypes. Every forecasting model you will ever build — from revenue projections to churn prediction to lead scoring — is a variation of one of these three. Understanding which archetype fits your problem is the first decision, and getting it wrong is the most expensive mistake in analytics.
Time series is the workhorse of business forecasting. Revenue by month, active users by week, support tickets by day — any metric tracked over time is a time series problem. The historical pattern contains the signal: seasonality, trend, cyclicality. Traditional statistical methods like ARIMA and exponential smoothing extract these patterns mechanically. AI adds the ability to detect non-linear patterns and incorporate external variables — economic indicators, competitor actions, weather data — that traditional methods cannot handle. The result is not magic. It is pattern recognition operating at a scale and speed that human analysts cannot match.
Regression answers the "what drives this" question. Customer lifetime value is not random — it is driven by onboarding completion rate, first-week engagement, plan tier, and industry vertical. Regression quantifies the relationship between inputs and output, giving you both a prediction and an explanation. Classification answers the "which bucket" question: will this lead convert or not? Is this transaction fraudulent or legitimate? Will this customer churn this quarter or stay? The archetype determines the method. The method determines the output. Choose the wrong archetype and the model gives precise answers to the wrong question.
Do This
- Identify the prediction archetype before selecting a method or tool
- Start with the simplest model that fits the archetype — complexity comes later
- Validate predictions against holdout data before deploying to production
Avoid This
- Jump to a machine learning model before understanding the problem structure
- Use classification when the business question is really about magnitude
- Train on all available data without reserving a validation set