PE-301a · Module 2
Logistic Regression for Pipeline
3 min read
Logistic regression is the workhorse of propensity modeling because it is interpretable, stable, and performant with the sample sizes typical in B2B pipelines. Unlike a neural network that is a black box, logistic regression produces coefficients that tell you exactly how much each feature contributes to the prediction. Decision-maker engaged increases the log-odds by 1.4. Each additional meeting in 14 days increases it by 0.3. Each close date push decreases it by 0.5. The model is transparent, and transparency drives adoption.
- Feature Encoding Convert all features to numeric values. Binary features (decision-maker engaged: yes/no) become 1/0. Categorical features (industry, source) become one-hot encoded columns. Continuous features (deal size, days in stage) are standardized to have mean 0 and standard deviation 1 so that coefficients are comparable across features.
- Train-Test Split Split your dataset 80/20: train on 80% of deals, test on 20%. The test set simulates unseen data — it tells you how the model performs on deals it has never seen. A model that performs well on training data but poorly on test data is overfitting — it memorized the training set instead of learning generalizable patterns.
- Coefficient Interpretation After training, examine the coefficients. The largest positive coefficients are your strongest win predictors. The largest negative coefficients are your strongest loss predictors. Share these with the sales team — they are a data-driven list of what actually drives deals to close and what kills them.