A/B Sample Size Planner

Design tool

Plan the required sample size for A/B tests that compare two independent groups — either two proportions (conversion rates) or two means (continuous metrics). Explore how detectable effect size, baseline rates or variability, power, and confidence level change the per-group and total sample size.

👨‍🏫 Professor Mode: Guided Learning Experience

New to sample size planning? Enable Professor Mode for step-by-step guidance through determining the right sample size for your A/B test!

TEST OVERVIEW & EQUATIONS

This planner uses large-sample normal approximations to estimate the per-group sample size for two independent groups. You can work in either (a) two proportions (e.g., conversion rates) or (b) two means (e.g., average order value), with adjustable power and confidence.

Two independent means (equal variance, approximate t-test):
$$ n_{\text{per group}} \approx \frac{2\,(z_{1-\alpha^*} + z_{1-\beta})^2 \sigma^2}{\Delta^2} $$ where \(\Delta = \mu_2 - \mu_1\) is the minimal effect you want to detect, \(\sigma\) is the common standard deviation, \(z_{1-\alpha^*}\) is the normal critical value for your test (two-sided or one-sided), and \(z_{1-\beta}\) is the critical value for the desired power.

Two independent proportions (approximate z-test):
$$ n_{\text{per group}} \approx \frac{\left[z_{1-\alpha^*}\sqrt{2\bar{p}(1-\bar{p})} + z_{1-\beta}\sqrt{p_1(1-p_1)+p_2(1-p_2)}\right]^2}{(p_2-p_1)^2} $$ where \(p_1\) and \(p_2\) are the group proportions and \(\bar{p} = (p_1 + p_2)/2\).

Additional notes & assumptions

These formulas assume independent observations, reasonably large samples, and approximate normality of the relevant test statistic. Results are planning-level: they help you understand required sample sizes before running an experiment, but small-sample or highly imbalanced designs may need more advanced methods.

MARKETING SCENARIOS

Use presets to explore realistic A/B testing questions a marketing analyst might face, such as subject line experiments, landing page redesigns, or new upsell offers at checkout. Each scenario sets a baseline rate or mean, a minimum effect that would justify a rollout, and default confidence/power so you can see how many impressions, visits, or orders you would need before calling a winner.

INPUTS & SETTINGS

Specify the A/B test

Two proportions (e.g., A/B conversion rates)

Use last campaign's conversion rate or open rate as your baseline.

Minimum effect you care about (e.g., from 20% to 25%).

The detectable lift is \(\Delta_p = p_2 - p_1\). The planner will compute the per-group sample size needed to detect at least this difference with your chosen power and confidence.

Two means (e.g., order value or time on site)

Use prior data or a pilot to approximate variability.

Help me estimate \(\sigma\) from a range

If you only have a rough sense of the minimum and maximum values you expect, you can use the rule-of-thumb that, for approximately bell-shaped data, most observations fall within about \(\pm 2\sigma\) of the mean. That implies the total range is roughly \(4\sigma\), so \(\sigma \approx \frac{\text{max} - \text{min}}{4}\).

This will set \(\sigma\) to \((\text{max} - \text{min}) / 4\). Use it as a starting point and refine with pilot data when available. This is especially handy for bounded marketing survey scales (for example, a 1–7 satisfaction rating), where the minimum and maximum are known.

The detectable difference in means is \(\Delta_\mu = \mu_2 - \mu_1\). The planner will estimate the per-group sample size needed to detect at least this shift.

Confidence and alpha are linked: confidence = 1 - alpha (for two-sided tests).

Common choices are 80% or 90% power.

Advanced settings

Two-sided tests are standard when any increase or decrease is important. One-sided tests focus on a directional improvement but assume you would not act on a decrease.

Use unequal allocation when you want to expose more traffic to the new variant or limit exposure to a risky treatment. Equal allocation is simplest and most efficient when costs are similar.

Additional info about these settings

Confidence level & alpha: The confidence level (for example, 95%) describes how often the procedure would capture the true effect if you repeated the experiment many times. Alpha is the tolerated false-positive rate (for example, 5% for a 95% confidence, two-sided test).

Power: Power is the probability that your test will detect the specified effect size if it is really present in the population. Higher power reduces the risk of a false negative but requires more observations.

Test type: A two-sided test checks for any difference between groups (better or worse). A one-sided test checks only for an improvement (or only for a decline), which can lower the required sample size but must be justified by the decision context.

Allocation ratio: The ratio controls how many observations are assigned to Group B relative to Group A. A 1:1 ratio is typical for clean comparisons; unbalanced designs can be useful when one variant is more expensive, riskier, or more interesting to over-sample.

VISUAL OUTPUT

Required total sample vs. effect size

This chart shows how required total sample size changes as the effect size gets larger. For proportions, the x-axis is \(|p_2 - p_1|\) (difference in rates); for means, it is \(|\mu_2 - \mu_1|\).

Required total sample vs. power

This chart plots required sample size against desired power, holding other inputs fixed. Higher power (greater sensitivity) always requires a larger sample size.

Required total sample vs. variability

This chart shows how assumptions about variability affect the required sample size: for proportions, different baselines \(p_1\); for means, different values of the common standard deviation \(\sigma\).

DESIGN SUMMARY

Required n (Group A):
Required n (Group B):
Total sample size (N):
Outcome type:
Test / Power:

APA-Style Planning Statement

Provide baseline values, effect size, confidence, and power above to generate an APA-style statement summarizing the required sample size per group.

Managerial Interpretation

This panel will translate the design into plain language, explaining how many observations you need in each arm of the A/B test, and how that depends on the effect size, baseline, power, and confidence.

DIAGNOSTICS & ASSUMPTIONS

Diagnostics & Assumption Checks

These sample size formulas assume independent observations in each group, stable baselines, and that the two groups only differ through the treatment / variant. When outcomes are rare, highly skewed, or overdispersed, or when you use repeated measures or clustering, consult more advanced designs (e.g., mixed models or cluster-randomized power calculations).

If the required sample size is infeasible, you can look for a larger effect size to target, lower the required power (e.g., from 90% to 80%), or accept a higher alpha (with clear communication of the tradeoffs).