Specify the A/B test
Two proportions (e.g., A/B conversion rates)
Use last campaign's conversion rate or open rate as your baseline.
Minimum effect you care about (e.g., from 20% to 25%).
The detectable lift is \(\Delta_p = p_2 - p_1\). The planner will compute the per-group sample size needed to detect at least this difference with your chosen power and confidence.
Two means (e.g., order value or time on site)
Use prior data or a pilot to approximate variability.
Help me estimate \(\sigma\) from a range
If you only have a rough sense of the minimum and maximum values you expect, you can use the rule-of-thumb that, for approximately bell-shaped data, most observations fall within about \(\pm 2\sigma\) of the mean. That implies the total range is roughly \(4\sigma\), so \(\sigma \approx \frac{\text{max} - \text{min}}{4}\).
This will set \(\sigma\) to \((\text{max} - \text{min}) / 4\). Use it as a starting point and refine with pilot data when available. This is especially handy for bounded marketing survey scales (for example, a 1–7 satisfaction rating), where the minimum and maximum are known.
The detectable difference in means is \(\Delta_\mu = \mu_2 - \mu_1\). The planner will estimate the per-group sample size needed to detect at least this shift.
Confidence and alpha are linked: confidence = 1 - alpha (for two-sided tests).
Common choices are 80% or 90% power.
Advanced settings
Two-sided tests are standard when any increase or decrease is important. One-sided tests focus on a directional improvement but assume you would not act on a decrease.
Use unequal allocation when you want to expose more traffic to the new variant or limit exposure to a risky treatment. Equal allocation is simplest and most efficient when costs are similar.
Additional info about these settings
Confidence level & alpha: The confidence level (for example, 95%) describes how often the procedure would capture the true effect if you repeated the experiment many times. Alpha is the tolerated false-positive rate (for example, 5% for a 95% confidence, two-sided test).
Power: Power is the probability that your test will detect the specified effect size if it is really present in the population. Higher power reduces the risk of a false negative but requires more observations.
Test type: A two-sided test checks for any difference between groups (better or worse). A one-sided test checks only for an improvement (or only for a decline), which can lower the required sample size but must be justified by the decision context.
Allocation ratio: The ratio controls how many observations are assigned to Group B relative to Group A. A 1:1 ratio is typical for clean comparisons; unbalanced designs can be useful when one variant is more expensive, riskier, or more interesting to over-sample.