Selection Probability Lab

Sampling

Explore the probability that specific items are selected from a finite population. Adjust the population size, sample size, and number of special items, and compare exact math to simulated draws, with live equations that show how the formulas work for your inputs.

👨‍🏫 Professor Mode: Guided Learning Experience

New to selection probability? Enable Professor Mode for step-by-step guidance through understanding sampling and selection mechanics!

OVERVIEW & CONCEPTS

This lab focuses on questions like: "If I draw a sample of size \(n\) from a population of size \(N\), what is the chance that a particular set of \(r\) special items appears in my sample?". You can toggle between sampling without replacement (hypergeometric) and with replacement (binomial-style) frameworks.

Key ideas illustrated

Hypergeometric model (without replacement): draws from a finite population where each individual can only be selected once. The number of special items in the sample follows a hypergeometric distribution.

Binomial model (with replacement): repeated independent draws with the same probability of selecting a special item on each draw. The number of special items in the sample follows a binomial distribution.

"At least one" logic: the probability of seeing at least one special item is computed via the complement: \(P(\text{at least one}) = 1 - P(\text{zero specials})\). The app shows this computation numerically and symbolically.

MARKETING SCENARIOS

Use these presets to explore realistic selection questions, such as the chance a VIP customer appears in a sample, the probability of at least one high-value lead in repeated with-replacement draws, or the likelihood of finding a specific number of defects in a quality-control check.

POPULATION & SAMPLING SETTINGS

Configure the scenario

Think of this section as describing a story like: "Out of \(N\) customers, \(r\) have some outcome I care about, and I draw a sample of size \(n\). What is the chance I see this outcome \(k\) times?". The labels and numbers you choose below are the ones used throughout the visuals, equations, and narrative explanations.

This short phrase names the outcome you care about (for example, "opens email", "is a VIP customer", or "contains a defect").

Total number of items in the population.

Number of draws in each sample.

How many population members have the outcome you named above.

Used when computing \(P(K = k)\). For "at least one", the app uses the complement rule.

Switch between finite-population draws and independent draws with constant probability.

Choose whether to focus on a specific count or the chance of seeing at least one special item.

What event are we measuring?

When you select Exact: \(P(K = k)\), the event is “the sample contains exactly \(k\) items with this outcome”. For example, if you set \(k = 2\), the app is computing the chance that exactly two of your \(n\) sampled units have the outcome (such as being a VIP or containing a defect).

When you select At least one: \(P(K \ge 1)\), the event is “the sample contains one or more such items”. This is the natural choice for questions like “What is the chance we see any VIP customers in this sample of size \(n\)?”.

Used for Monte Carlo simulation to approximate the distribution.

How does simulation help?

Each simulated sample is a “what if” run of your design: we draw \(n\) items using the sampling mode you chose and count how many have the outcome of interest. Repeating this many times builds up the orange bars in the distribution chart.

As you increase the number of simulated samples, the simulated probabilities \(P(K = k)\) should get closer to the exact hypergeometric / binomial probabilities shown in blue, illustrating the link between theoretical probability models and long-run frequencies.

Additional info about these inputs

Population size \(N\): think of this as the full list of customers, items, or units you could sample from. In many marketing problems \(N\) is large but finite (panel members, current customers, emails on a list).

Sample size \(n\): how many draws you make. For without replacement, this cannot exceed \(N\); for with replacement you can think of re-contacting or re-targeting with possible repeats.

Special items \(r\): a fixed set you care about (VIP customers, high-value prospects, defective units). The model assumes these are known in advance and fixed inside the population.

Target count \(k\): used for \(P(K = k)\). For example, \(k = 0\) is the probability you miss all specials; \(k = 2\) is the chance you get exactly two specials in the sample.

Sampling mode: choose without replacement for one-off samples from a list, and with replacement for repeated, independent trials (like ad impressions) where the same unit could be sampled multiple times.

VISUAL OUTPUT

Population & sampled items

The grid shows the first \(N\) items in the population. Special items are highlighted; items included in the current sample are outlined.

Distribution of number of specials

The chart compares the theoretical distribution of \(K\), the number of special items in the sample, to the empirical distribution from simulated samples. Bars show probabilities for each possible value of \(K\); the orange bars are based on the number of simulated samples you have run.

How to read these visuals

The population grid shows which positions are special (highlighted) and which of those are captured in the current sample (outlined). This makes it easier to connect the abstract math to concrete draws.

The distribution chart stacks probability mass across values of \(K\). Blue bars show the exact hypergeometric/binomial probabilities; orange bars show how often each value appeared in your simulated samples. As you increase the number of simulations, the orange bars should approach the blue bars.

MATH DETAILS & WORKED EXAMPLES

Exact probability (current mode):
Simulated probability (current mode):
Expected number of specials \(E[K]\):
Interpreting these metrics

Exact probability: the value from the hypergeometric (no replacement) or binomial (with replacement) formula for your chosen mode (\(P(K = k)\) or \(P(K \ge 1)\)). This is the theoretical benchmark.

Simulated probability: the proportion of simulated samples whose outcome matches the event of interest. With enough simulations, this should track the exact probability closely, illustrating the link between probability models and long-run frequencies.

Expected number of specials \(E[K]\): the average number of specials you would see if you could repeat the sampling process many times. For both models, \(E[K] = n \cdot (r/N)\), but the distribution around that mean differs depending on whether you sample with or without replacement.

Understanding the \(\binom{n}{k}\) notation

In the equations below you will see expressions written as \(\binom{n}{k}\) or \(\binom{N}{n}\). This is read out loud as “n choose k” and it counts how many different groups of \(k\) items you can form from a larger pool of \(n\) items when order does not matter.

For example, if there are \(n = 10\) customers and you want to know how many different samples of size \(k = 3\) you could draw, \(\binom{10}{3} = 120\). That means there are 120 distinct 3‑person groups you might see.

Mathematically, \(\binom{n}{k}\) is computed as \[ \binom{n}{k} = \frac{n!}{k!\,(n-k)!}, \] where \(n!\) (“n factorial”) means \(n \times (n-1) \times \dots \times 2 \times 1\). In practice, the app uses a more numerically stable version of this calculation, but the idea is the same: it is counting how many ways you can choose positions for the “special” outcomes inside the sample.

General equations

Worked with your numbers

Exact vs. simulated distribution

k (number of specials) Theoretical P(K = k) Simulated frequency Simulated P(K = k) Cumulative P(K ≤ k)
How to read this table

k (number of specials): each row corresponds to a specific value of \(k\), the number of times the chosen outcome (your "special" label) occurs in the sample. The example row highlighted here corresponds to \(k =\) 2, meaning exactly that many specials are observed in the sample.

Theoretical P(K = k): this column shows the exact probability that a random sample will contain exactly \(k\) specials, computed from the hypergeometric or binomial model. In your current table, the example row has theoretical probability 0.1800.

Simulated frequency: after you run simulations, this column counts how many of those simulated samples produced each value of \(k\). For the example row, the table shows 360 out of 2,000 simulated samples landing on that value of \(k\).

Simulated P(K = k): this is the simulated frequency divided by the total number of simulations. In your current table, the example row has simulated probability 0.1800, which should be close to the theoretical value if you have run enough simulations.

Cumulative P(K ≤ k): this column adds up the theoretical probabilities from 0 up through that row’s value of \(k\). If the cumulative value in the row for \(k = 2\) is 0.6500, you can read that as “there is a 65% chance of seeing at most two specials in the sample.”