Multiple Regression with Interactions & Non-Linear Effects

Advanced

Build on multiple regression by adding interaction terms (moderation effects) and quadratic terms (non-linear relationships). Understand how the effect of one predictor depends on another, or how relationships curve rather than staying linear.

OVERVIEW & CONCEPTS

This tool extends multiple linear regression to handle two powerful concepts:

Interactions (Moderation): When the effect of predictor X₁ on outcome Y depends on the level of another predictor X₂
Non-linear effects (Quadratic): When a predictor's relationship with the outcome is curved (e.g., inverted U-shape)

Interaction Effects

With interaction term: $$ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_3 (X_{1i} \times X_{2i}) + \varepsilon_i $$

The interaction coefficient ($\beta_3$) tells you how much the effect of X₁ changes for each unit increase in X₂. Key insight: If $\beta_3$ is significant, the relationship between X₁ and Y is not constant—it varies depending on X₂.

Quadratic Effects

With quadratic term: $$ Y_i = \beta_0 + \beta_1 X_{i} + \beta_2 X_{i}^2 + \varepsilon_i $$

The quadratic coefficient ($\beta_2$) captures curvature. If $\beta_2 < 0$, the relationship is an inverted U (increases then decreases). If $\beta_2 > 0$, it's a U-shape (decreases then increases). Business relevance: Find optimal points (e.g., ideal price, optimal ad frequency).

Why Interactions Matter

Real-world effects rarely operate in isolation. Moderation (statistical interactions) recognizes that contexts matter:

Ad spend might drive revenue more during holidays than off-season
Price increases might hurt sales for low-quality products but enhance prestige for luxury items
Training programs might boost performance for junior employees but have little effect on seasoned veterans

Managerial implication: One-size-fits-all strategies miss opportunities. If an interaction is significant, you should tailor your approach based on the moderator.

Why Centering Matters for Interactions

When including interaction terms, mean-centering continuous predictors is recommended (and enabled by default in this tool). Here's why:

Interpretability: After centering, "main effects" represent effects when the other variable is at its mean (not at zero, which might be meaningless)
Multicollinearity reduction: Interaction terms are highly correlated with their component predictors. Centering reduces this correlation
Focal vs. moderator distinction: Centering helps you interpret which variable is the "focal predictor" (whose effect you're studying) vs. the "moderator" (what changes that effect)

Example: If studying ad_spend × seasonality, centering ad_spend means the season coefficients represent effects "at average ad spend levels," not at zero spend (which never happens).

Advanced users can disable centering, but interpretations become more complex.

Simple Slopes & Probing Interactions

When an interaction is significant, the next step is simple slopes analysis: testing whether the focal predictor's effect is significant at different levels of the moderator.

For continuous moderators, we conventionally test at three levels:

Low: Moderator at -1 standard deviation below its mean
Average: Moderator at its mean
High: Moderator at +1 standard deviation above its mean

For categorical moderators, we test the focal predictor effect within each category separately.

This tool visualizes these simple slopes in the interaction plots, making it easy to see where effects are strong vs. weak.

Quadratic Terms & Finding Optimal Points

Quadratic effects capture non-monotonic relationships—where "more is better" only up to a point, then becomes "too much of a good thing."

The turning point (maximum or minimum) occurs at: $$ X^* = -\frac{\beta_1}{2 \beta_2} $$

Business applications:

Optimal pricing: Too low = leaving money on table; too high = driving customers away
Ideal ad frequency: Too few = insufficient awareness; too many = annoyance and fatigue
Perfect difficulty level: Too easy = boredom; too hard = frustration

Interpretation note: Check that the optimal point falls within your observed data range. Extrapolating beyond observed values is risky.

Tool Limitation: One Interaction/Quadratic at a Time

For educational clarity, this tool restricts you to one interaction or quadratic effect per model. This is not a limitation of regression in general—real models often include multiple interactions.

Why this restriction helps learning:

Focuses attention on understanding one moderation or non-linear effect deeply
Keeps visualizations clear and interpretable
Prevents overfitting with limited sample sizes
Teaches principles that extend to more complex models

For professional analysis requiring multiple interactions or three-way interactions, use statistical software like R, Python, SPSS, or Stata.

DATA SOURCE

Load a marketing use case:

Use presets to explore realistic scenarios demonstrating interactions and non-linear effects in marketing, pricing, and gaming contexts. Each scenario can be downloaded, edited in Excel, and re-uploaded.

Upload a CSV file with raw case-level data. Include one outcome variable and multiple predictors (numeric or categorical).

Drag & Drop raw data file (.csv, .tsv, .txt, .xls, .xlsx)

Include headers; at least one numeric outcome and 2+ predictors.

No file uploaded.

INPUTS & SETTINGS

Interaction / Non-Linear Effect

Choose one interaction or quadratic effect to include in your model:

No interaction/quadratic (standard MLR) Continuous × Continuous interaction Continuous × Categorical interaction Categorical × Categorical interaction Quadratic effect (X²)

📌 Focal vs. Moderator — display choice only. Designating one variable as the "focal predictor" and the other as the "moderator" is purely a visualization and reporting convention. It controls which variable goes on the x-axis of the interaction plot and which defines the separate lines, and which variable's simple slopes are reported. It does not change the regression model in any way — the same interaction term X₁ × X₂ is estimated regardless. R², coefficients, p-values, and model fit are identical whether you call X₁ focal or X₂ focal. Use the Swap button freely to see the interaction from whichever angle is most intuitive for your audience.

Confidence Level & Reporting

Set the significance level for hypothesis tests and confidence intervals.

Significance level (α)

Advanced Analysis Settings

Mean-center continuous predictors for interactions (recommended)

Centering improves interpretability and reduces multicollinearity. Main effects then represent effects "at average levels" of other variables. Disable only if you have specific reasons.

Show confidence bands on interaction plots

Toggle visibility of shaded confidence intervals around predicted lines. Useful for assessing uncertainty in simple slopes.

VISUAL OUTPUT

Interaction / Effect Plot

Interpretation Aid

SUMMARY STATISTICS

Summary Statistics

How to Use Summary Statistics

Review these before interpreting regression coefficients — they provide the context needed to judge practical significance:

Mean & Median: If they differ substantially, the distribution is skewed — regression coefficients (which minimize squared errors) may be pulled toward outliers. Consider whether a log transform is appropriate.
Std. Dev.: Used to evaluate practical significance. A coefficient of 0.5 on a predictor with SD = 100 is very different from a coefficient of 0.5 on a predictor with SD = 1. Use standardized betas (from ml_regression) for comparisons across predictors.
Min / Max: Check that your data range is plausible. Extreme outliers can severely distort regression results. Also confirms the range over which interaction plots are valid — predictions outside [Min, Max] are extrapolations.
Variables marked with *: Were mean-centered in the model. Their raw means are shown here so you can convert centered predictions back to original units.

Outcome & Continuous Predictors

Variable	Mean	Median	Std. Dev.	Min	Max
Provide data to see summary statistics.

Categorical Predictors (% by level)

Predictor	Level	Percent
Provide data to see level percentages.

TEST RESULTS

Regression Equation

Provide data to see the fitted regression equation.

How to Read This Equation

The regression equation shows the mathematical relationship the model has learned. Each term contributes to the predicted outcome (Ŷ):

Intercept (β₀): The predicted outcome when all predictors are at zero (or at their means if mean-centering is enabled). Often not directly meaningful on its own.
Main effects (βᵢ × Xᵢ): The effect of each predictor, holding all others constant. With an interaction in the model, these "main effects" represent the predictor's effect when the other interacting variable is at zero (or its mean, if centered).
Interaction term (βᵢⱼ × Xᵢ × Xⱼ): Captures how the effect of one predictor changes depending on the level of another. Even a small interaction coefficient can be substantively important.
Quadratic term (β × X²): Captures curvature — whether the relationship accelerates or decelerates.

Mean-centering note: When centering is enabled, predictor values in the equation are deviations from the mean, not raw values. To make a prediction, subtract each variable's mean before plugging in. The mean-centering note above the equation (when visible) provides the means you need.

R-squared:–

Adj. R-squared:–

Model F:–

Model p-value:–

RMSE:–

MAE:–

Sample size (n):–

Alpha:–

Overall Model F (α):–(omnibus test; see Model Fit Comparison for interaction)

What Do These Model-Fit Metrics Mean?

R² (R-squared): The proportion of variance in the outcome explained by all predictors together. R² = 0.40 means the model accounts for 40% of the outcome's variability. Higher is better, but adding more predictors always increases R² — so use Adj. R² to compare models of different sizes.

Adj. R² (Adjusted R-squared): R² penalized for the number of predictors. This is the preferred fit measure when comparing models with different numbers of terms. If adding a predictor doesn't meaningfully improve fit, Adj. R² will decrease or stay flat.

Model F & p-value: The omnibus F-test checks whether the full model (all predictors together) explains significantly more variance than a model with no predictors at all. A significant p-value (< α) means at least one predictor is useful — but does not tell you which one, or whether the interaction term specifically is significant. For that, see the Model Fit Comparison card below.

RMSE (Root Mean Squared Error): The typical prediction error in the same units as your outcome. E.g., if outcome is revenue in dollars and RMSE = 12.4, predictions are off by roughly $12.40 on average (with larger errors weighted more). Lower is better.

MAE (Mean Absolute Error): Like RMSE but without squaring — the average absolute error in outcome units. Less sensitive to outliers than RMSE. If MAE ≪ RMSE, there are a few large errors pulling RMSE up.

Sample size (n): Number of complete observations used. Missing data on any variable reduces n. Rules of thumb: n ≥ 10–20 per predictor term for stable estimates; n ≥ 50+ for reliable interaction effects.

Decision (omnibus): Based on the overall model p-value vs. your chosen α. "Reject H₀" means the full model explains a statistically significant amount of variance. This does not imply the interaction is significant — check the coefficient table and Model Fit Comparison.

Model Fit Comparison

Does adding the interaction or quadratic term improve model fit beyond main effects only?

Model	R²	Adj. R²	Terms Added	ΔR²	F-change	p (F-change)
Main effects only	–	–	–	–	–	–
Full model (main effects + interaction/quadratic)	–	–	–	–	–	–

How to Interpret Model Fit Comparison

Why compare models? Adding an interaction or quadratic term uses extra degrees of freedom (df). The F-change test asks: does the improvement in fit justify the extra complexity? This is the appropriate test for whether your interaction/quadratic term is statistically warranted.

ΔR² (Delta R-squared): The gain in explained variance when moving from the main-effects-only model to the full model with the interaction/quadratic. E.g., ΔR² = 0.04 means the interaction accounts for an additional 4% of variance beyond the main effects. Even small ΔR² values can be practically important if the interaction is theoretically motivated.

F-change: The ratio of additional variance explained per added term to the residual variance. Computed as: F = (ΔR² / df_extra) / MSE_full. Larger values indicate a stronger marginal contribution.

Terms Added: The number of new regression terms introduced by the interaction or quadratic (e.g., a continuous × categorical interaction with 3 levels adds 2 terms).

p (F-change): The probability of observing this improvement by chance alone. If p < α, the interaction/quadratic significantly improves model fit. If p ≥ α, the interaction term does not add statistically reliable explanatory power — though it may still be retained if there is strong theoretical justification.

Adj. R² comparison: If Adj. R² increases from main-effects to full model, the interaction improves fit even after penalizing for complexity. A decrease in Adj. R² despite a slight R² gain signals the added term is not earning its keep.

⚠️ Note: statistical significance of the overall F-change (above) and significance of the interaction coefficient in the coefficient table should agree. Minor discrepancies can arise from numerical precision.

APA-Style Statistical Reporting

About APA Reporting

APA (American Psychological Association) format is the standard for reporting regression results in academic and professional research. It reports F-statistics as F(df₁, df₂) = value, p = value; coefficients as b = value, SE = value, t(df) = value, p = value; and R² as a measure of overall fit. When submitting to journals or writing research reports, use this section as your template — but always double-check against the specific journal's style guide.

Managerial Interpretation

About the Managerial Summary

This section translates statistical results into plain-language strategic insights. It highlights what the model implies for decision-making — e.g., whether to segment strategies by moderator level, where diminishing returns occur, or which predictors drive the outcome most. Always combine these insights with domain knowledge and consider whether results replicate on held-out data before acting.

Coefficient Estimates

Predictor	Term	Estimate	Std. Error	t	p-value	Lower CI	Upper CI
Provide data to see coefficient estimates.

Coefficient Interpretation Guide

How to read each column:

Predictor / Term: The variable or derived term (main effect, interaction, or quadratic). Interaction terms are labeled "X₁ × X₂"; quadratic terms use "X²".
Estimate (b): The unstandardized regression coefficient — the predicted change in the outcome for a one-unit increase in that term, holding all others constant. For interaction terms, it represents how much the effect of the focal predictor changes per unit of the moderator.
Std. Error: The precision of the estimate. Larger SE = more uncertainty. SE is used to compute the t-statistic and confidence interval.
t: The test statistic = Estimate / SE. Used to determine if the coefficient is significantly different from zero.
p-value: The probability of observing a t-statistic this large (in either direction) if the true coefficient were zero. Highlighted in blue when p < α (statistically significant). Remember: significance ≠ importance — always consider effect size.
CI Lower / Upper: The confidence interval for the estimate. If this interval does not include zero, the coefficient is significant at your chosen α level. Wider intervals = more uncertainty. Use these to judge practical significance — a significant p-value with a tiny coefficient may not be actionable.

Interaction coefficient caution: With mean-centering enabled, interaction coefficients are interpreted "at average levels" of the other variable. Without centering, the intercept and main effects change substantially but the interaction coefficient itself does not.

Categorical predictors: Dummy-coded against a reference level (shown in parentheses). Each coefficient represents the mean difference between that level and the reference, holding continuous predictors constant.

Simple Slopes Summary

Level / Condition	Slope (β)	SE	t	p-value

How to Interpret Simple Slopes

What is a simple slope? It is the effect (slope) of the focal predictor on the outcome at a specific level of the moderator. Because an interaction exists, this effect is not constant — it depends on where the moderator is fixed.

How to read the table:

Slope (β): The estimated change in the outcome for a one-unit increase in the focal predictor, at that specific moderator level. Positive = positive effect at that level; negative = the focal predictor actually hurts the outcome at that moderator level.
SE: Standard error of the simple slope, computed via the delta method (accounts for variance in both the main effect and interaction coefficient, plus their covariance).
t & p-value: Tests whether the simple slope is significantly different from zero at that moderator level. A non-significant slope means the focal predictor has no reliably detectable effect at that level of the moderator — even if the interaction itself is significant.

For continuous × continuous interactions: Levels are Mean − 1SD, Mean, and Mean + 1SD of the moderator. These are conventional "low, average, high" probing points. The slope at Mean is equal to the main-effect coefficient in the centered model.

For continuous × categorical interactions: Each row tests the slope of the continuous predictor within one category. The baseline row uses the main-effect coefficient directly; other rows add the relevant interaction coefficient using the delta method for the SE.

Spotting a crossover interaction: If slopes change sign across levels (e.g., positive at low moderator, negative at high), the interaction is a "crossover" — the focal predictor's effect reverses direction. This has major strategic implications.

Diagnostics & Assumption Checks

Run the analysis to see checks on multicollinearity, residual patterns, and model fit.

How to Interpret These Diagnostics

Multicollinearity (VIF — Variance Inflation Factor):

VIF = 1: No correlation with other predictors (ideal).
VIF 1–5: Moderate correlation — generally acceptable.
VIF 5–10: High correlation — coefficients become less stable; SEs inflate. Interpret individual coefficients cautiously.
VIF > 10: Severe multicollinearity — coefficient estimates may be unreliable. Consider removing a collinear predictor, combining variables, or using ridge regression. Note: interaction and quadratic terms are inherently correlated with their components — this is why mean-centering is recommended. VIF for interaction terms will typically be elevated even after centering; this is expected and not a crisis unless VIF exceeds 10–15.

Residual Checks:

Mean of residuals ≈ 0: Any departure from 0 indicates a systematic bias in predictions (this should always be ≈ 0 for OLS).
Std. dev. of residuals: This is approximately RMSE — the typical size of prediction errors in outcome units.

Residuals vs. Fitted plot: Look for (a) a flat, horizontal scatter with no curve — curvature suggests a missing non-linear term; (b) roughly equal spread at all fitted levels — a funnel shape (heteroscedasticity) means variance grows with the outcome, which can inflate or deflate SEs; (c) no extreme outliers far from the bulk — single influential points can distort all coefficients.

Residuals vs. Fitted

Actual vs. Fitted