Propensity Score Matching

Causal Inference

Estimate causal treatment effects from observational marketing data using propensity score matching. Upload data with a treatment indicator and outcome, select covariates for balance, and the tool performs nearest-neighbor matching to create comparable treated and control groups.

šŸ‘Øā€šŸ« Professor Mode: Guided Learning Experience

New to propensity score matching? Enable Professor Mode for step-by-step guidance through matching methods, balance diagnostics, and interpreting causal treatment effects!

OVERVIEW & APPROACH

Propensity Score Matching (PSM) is a technique for estimating causal effects from observational data. When we cannot randomly assign treatment (e.g., who receives a marketing campaign), treated and control groups may differ systematically. PSM addresses this by matching each treated unit to control units with similar propensity scores—the probability of receiving treatment given observed covariates.

Step 1 – Propensity Score Model: $$ e(X_i) = \Pr(T_i = 1 \mid X_i) = \text{logit}^{-1}(\beta_0 + \beta_1 X_{1i} + \dots + \beta_p X_{pi}) $$ We fit a logistic regression predicting treatment \(T\) from covariates \(X\). The fitted probability \(\hat{e}(X_i)\) is the propensity score.

Step 2 – Matching: For each treated unit, find the control unit(s) with the closest propensity score. This creates a matched sample where treated and control groups are balanced on covariates.

Step 3 – Treatment Effect: $$ \text{ATT} = \frac{1}{n_1} \sum_{T_i=1} \left[ Y_i - Y_{j(i)} \right] $$ The Average Treatment Effect on the Treated (ATT) compares outcomes between each treated unit and its matched control(s).

When to use propensity score matching

PSM is appropriate when you have observational data where treatment was not randomly assigned (e.g., customers who self-selected into a loyalty program, users who happened to see an ad). You need: (1) a binary treatment indicator, (2) an outcome you want to measure the treatment effect on, and (3) covariates that predict treatment assignment and may confound the treatment-outcome relationship.

Key limitation: PSM only adjusts for observed covariates. If unobserved factors influence both treatment and outcome (hidden confounding), the estimated effect may still be biased. This is why good causal reasoning and domain knowledge are essential.

Interpreting the Average Treatment Effect on the Treated (ATT)

The ATT answers: "Among those who received treatment, what was the average effect compared to similar untreated individuals?" This is often the most policy-relevant question (e.g., "Did the campaign lift sales among people we actually targeted?").

Note: ATT differs from ATE (Average Treatment Effect), which estimates the effect across the entire population. PSM with 1:1 matching typically estimates ATT.

MARKETING SCENARIOS

Use presets to explore causal inference scenarios, such as loyalty program enrollment or email campaign exposure. Each scenario provides a raw data file with treatment, outcome, and covariates.

DATA & VARIABLE SELECTION

Upload Raw Data File

Upload a CSV file with case-level data including: a binary treatment indicator (0/1), an outcome variable, and covariates for matching. Headers are required.

Drag & Drop raw data file (.csv, .tsv, .txt)

Include headers; must have treatment (binary), outcome, and covariate columns.

No file uploaded.

PROPENSITY SCORE DIAGNOSTICS

Propensity Score Distribution

Interpretation Aid

Technical Interpretation

What you're seeing: The histograms show the distribution of estimated propensity scores for treated (blue) and control (orange) groups. Each propensity score represents the predicted probability that a unit received treatment, based on its covariate values.

Good overlap: When both distributions cover similar ranges (e.g., both spanning 0.2–0.8), matching can find comparable units. The key assumption of PSM—that we can find similar treated and control units—is satisfied.

Warning signs: If treated units cluster near 1.0 and controls near 0.0 with little overlap, the propensity model is "perfectly predicting" treatment. This means groups are fundamentally different on observed characteristics, and matching will fail or produce few matches.

Practical Interpretation

Marketing example: In a loyalty program analysis, if customers who enrolled all have very high propensity scores while non-enrollees have very low scores, it means enrolled customers were systematically different (e.g., higher prior spend, more visits). Finding a "fair comparison" will be difficult.

What to do if overlap is poor: (1) Use fewer or different covariates, (2) collect more data, or (3) consider that the treatment effect may not be estimable with this data—the groups are too different to compare.

Covariate Balance: Before vs. After Matching

Interpretation Aid

Technical Interpretation

Standardized Mean Difference (SMD): Each point shows how different treated and control groups are on a covariate, measured in pooled standard deviation units. An SMD of 0.2 means the groups differ by 0.2 standard deviations on that variable.

Balance thresholds: SMD < 0.1 indicates excellent balance (green zone). SMD 0.1–0.25 is acceptable but shows residual imbalance (yellow). SMD > 0.25 suggests meaningful differences that may bias the ATT estimate (red).

Before vs. After: Open circles show pre-matching imbalance; filled circles show post-matching balance. Successful matching moves all points toward zero (the center dashed line).

Practical Interpretation

What it tells you: This "Love Plot" shows whether matching created comparable groups. If all filled circles (after matching) are within the ±0.1 bands, you can be confident that treated and matched control groups are similar on observed characteristics.

Marketing example: If "Prior_Spend" has SMD = 0.5 before matching (treated spent more) but SMD = 0.05 after matching, the matched comparison controls for prior spending behavior. The ATT now compares customers with similar spending histories.

Action: If any covariate remains imbalanced (SMD > 0.25) after matching, consider adding it to the outcome model or interpreting results cautiously—remaining confounding may bias the effect estimate.

MATCHING SUMMARY

Treated Units --
Control Units --
Matched Pairs --
Unmatched Treated --
Mean PS (Treated) --
Mean PS (Matched Control) --
Understanding Matching Quality

Technical Interpretation

Matched pairs: The number of treated units successfully matched to control units within the caliper threshold. This determines the effective sample size for ATT estimation.

Unmatched treated: Treated units outside the "common support" region—they have propensity scores too extreme to find comparable controls. High unmatched rates (>20%) suggest limited overlap.

Mean propensity scores: After matching, mean PS should be nearly identical between treated and matched controls. Differences > 0.05 suggest residual imbalance.

Practical Interpretation

Sample size trade-off: PSM often discards observations (unmatched treated or unused controls). A match rate of 80%+ is typically good; below 50% suggests the groups may be too different for meaningful comparison.

Who gets excluded: Unmatched treated units are often "extreme" cases—e.g., customers with very high engagement who enrolled in a program but have no comparable non-enrollees. The ATT estimate applies only to the matched subset, which may differ from all treated units.

Generalizability: If many treated units are excluded, ask: "Can I generalize the treatment effect to the excluded cases?" If they're systematically different, the answer may be no.

COVARIATE BALANCE TABLE

Standardized Mean Differences

Covariate Mean (Treated) Mean (Control) SMD Before Mean (Matched Ctrl) SMD After % Reduction
Run matching to see balance diagnostics.

SMD interpretation: |SMD| < 0.1 = excellent balance (green), 0.1–0.25 = acceptable (yellow), > 0.25 = concerning (red).

About Standardized Mean Differences

Technical Interpretation

What SMD measures: The difference in means between treated and control groups, divided by the pooled standard deviation: SMD = (Mean_Treated - Mean_Control) / SD_pooled. This makes the measure unit-free and comparable across variables.

For categorical variables: Each category level is treated as a 0/1 indicator, and SMD is calculated for the proportion in each level.

% Reduction: Shows how much matching improved balance. 80%+ reduction is excellent; negative reduction means matching made things worse (rare, but possible with replacement).

Practical Interpretation

Reading the table: Compare the "SMD Before" and "SMD After" columns. Ideally, all "SMD After" values should be near zero (green). Covariates with high SMD after matching may still confound your treatment effect estimate.

Example: If "Prior_Spend" has SMD = 0.4 before matching, treated customers spent 0.4 SDs more than controls before the program. After matching (SMD = 0.05), the difference is negligible—you're now comparing customers with similar spending histories.

If balance is poor: Try adjusting matching settings (wider caliper, different covariates) or consider that selection bias may be too strong to overcome with this data.

TREATMENT EFFECT RESULTS

Average Treatment Effect on the Treated (ATT)

Estimated ATT: --
Standard Error: --
95% CI: --
t-statistic: --
p-value: --
Mean Outcome (Treated): --
Mean Outcome (Matched Control): --
Interpreting the ATT

Technical Interpretation

Average Treatment Effect on the Treated (ATT): The mean difference in outcomes between treated units and their matched controls: ATT = (1/n) Σ(Y_treated - Y_matched_control). This estimates the causal effect of treatment for those who received it.

Standard Error & 95% CI: The SE measures uncertainty in the ATT estimate. The 95% CI means we are 95% confident the true effect lies within this range. If the CI excludes zero, the effect is statistically significant at α = 0.05.

t-statistic & p-value: The t-stat is ATT/SE; p-value tests Hā‚€: ATT = 0. p < 0.05 provides evidence of a non-zero treatment effect.

Practical Interpretation

What the number means: If ATT = 120 for a spending outcome, treated customers spent an average of $120 more than similar untreated customers. This is the estimated causal impact of the treatment.

Business decision-making: Compare the ATT to the cost of treatment. If a loyalty program costs $50/customer to run but generates ATT = $120 in additional spend, the program has positive ROI.

Caution: The ATT is valid only if (1) the propensity model includes all confounders, (2) matching achieved good balance, and (3) there are no unmeasured confounders. PSM cannot prove causation—it can only adjust for observed differences.

Effect size vs. significance: A large p-value with a meaningful ATT suggests insufficient sample size. A tiny p-value with a tiny ATT may be statistically significant but practically irrelevant. Always consider both.

Statistical Interpretation

Run matching analysis to see results.

Managerial Takeaway

Run matching analysis to see actionable insights.

The downloaded file includes the matched sample with propensity scores, match IDs, and outcomes.

PROPENSITY MODEL DETAILS

View Propensity Score Model Coefficients

The propensity score model predicts treatment assignment from covariates using logistic regression.

Run matching to see the propensity model.
Covariate Coefficient Std. Error z p-value Odds Ratio
Run matching to see coefficients.
Pseudo R²: -- Model χ²: -- p-value: --