Overfitting Explorer

Model Fitting Bias-Variance Tradeoff

Discover how adding complexity improves training fit while destroying predictive power. Build polynomial models, reveal holdout data, and see overfitting happen in real-time.

OVERVIEW & LEARNING OBJECTIVES

Overfitting occurs when a model learns the noise in training data rather than the underlying pattern. By incrementally adding polynomial terms (X², X³, X⁴...) and comparing training versus holdout performance, you'll develop intuition for the bias-variance tradeoff that governs all predictive modeling.

🎯 What You'll Learn

Training fit is seductive: Watch R² climb as you add terms. The model looks better and better... on training data.
Holdout is the truth: Reveal the holdout sample to see how your complex model actually performs on unseen data.
The U-shaped curve: Discover the classic pattern—holdout error decreases, hits a minimum, then increases as complexity grows.
Parsimony pays: Learn why simpler models often predict better than complex ones, even when they fit training data worse.

💡 Why This Matters: Every marketing analytics model—from customer lifetime value to attribution—faces the overfitting trap. Understanding this tradeoff is the difference between models that work in production and models that only worked on your laptop.

📐 Mathematical Foundations

Polynomial Regression Model:

$$\hat{Y} = \beta_0 + \beta_1 X + \beta_2 X^2 + \beta_3 X^3 + \cdots$$

Root Mean Squared Error (RMSE):

$$\text{RMSE} = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}$$

Coefficient of Determination (R²):

$$R^2 = 1 - \frac{\text{RSS}}{\text{TSS}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}$$

Metric	Penalizes Complexity?	Use Case
R²	No	In-sample fit only
RMSE	No	Prediction error in original units
AIC	Yes (2k penalty)	Model selection, balanced
BIC	Yes (k×ln(n) penalty)	Model selection, conservative

⚠️ The Trap: Training R² always increases (or stays flat) when you add terms. It can never decrease. This is why R² alone cannot detect overfitting—you need holdout data or penalized metrics like AIC/BIC.

MARKETING SCENARIOS

Load a marketing use case:

Select a case study above to load data and see the business context.

BUILD YOUR MODEL

🔧 Model Terms

Select transformations to include in your model. Intercept is always included.

Intercept (β₀) X (β₁) X² (β₂) X³ (β₃) X⁴ (β₄) √X log(X)

Parameters: 2 / 6

Model: Ŷ = β₀ + β₁X

📊 Model Metrics

Training Sample

R² --

RMSE --

k 2

📈 Information Criteria

AIC --

BIC --

VISUALIZATION

Training Data Holdout Data Current Model Saved Models

MODEL COMPARISON

Save up to 3 models to compare their performance side-by-side.

Model 1 Not saved

Model 2 Not saved

Model 3 Not saved

💡 Comparing Models

When comparing models:

Training R² will almost always favor more complex models
Holdout RMSE reveals true predictive performance
AIC/BIC balance fit and complexity without needing holdout data
The "best" model often isn't the most complex one

THE MOMENT OF TRUTH

🔒

Save at least 2 models to unlock

Commit to your model choices before seeing how they perform on unseen data. This simulates real-world model deployment where you can't peek at future data.

0 / 2 models saved

EXPLORATORY QUESTIONS

🤔 Think About These Questions

The Training Trap: Start with just X. Then add X², X³, X⁴ one at a time. What happens to training R² each time? Does it ever go down?
The Holdout Truth: Now reveal the holdout data. Look at holdout RMSE as you add terms. Does it follow the same pattern as training RMSE?
Finding the Sweet Spot: At what complexity level does holdout error reach its minimum? Is this the same model that has the highest training R²?
The Wiggly Line: With 4+ polynomial terms, look at the fitted curve. Does it capture the "true" relationship, or is it chasing individual data points?
Real-World Implications: Imagine you built the highest-R² model and deployed it for next quarter's marketing budget allocation. What might happen?

📚 Connecting to Broader Concepts

⚖️ Bias vs. Variance

Simple models have high bias (underfit)—they miss patterns. Complex models have high variance (overfit)—they memorize noise. The goal is the sweet spot in between.

🔄 Cross-Validation

In practice, we use k-fold cross-validation rather than a single train/test split. This gives more reliable estimates of out-of-sample performance. Same principle, more robust execution.

📉 Regularization

Methods like Ridge and Lasso regression add penalty terms that shrink coefficients, effectively preventing overfitting without removing terms entirely. They automate what you're doing manually here.

🎯 Marketing Applications

Every marketing model—response curves, CLV prediction, churn scoring—faces this tradeoff. A model that "fit perfectly" on last quarter's data may fail catastrophically on next quarter's.