Binomial Random Variables

Manufacturers often monitor production quality by tracking the number of defective items produced in a batch. They might want to know the answer to the question:

What is the probability of producing 3 or fewer defective items in a batch of 100?

In this section, we will introduce the binomial random variable and explore how the binomial probability distribution can be used to answer similar questions.

Binomial Setting

Random variables can only follow a binomial distribution under certain conditions. A situation that satisfies these conditions is called a binomial setting, which has the following properties:

There is a fixed number of identical trials, denoted as n.
Each trial results in one of two outcomes: success (S) or failure (F)
The probability of success for a single trial remains the same for each trial, denoted by p, where 0 $\leq$ p $\leq$ 1. The probability of failure is q = (1-p).
The trials are independent.

It is important to note two things. First, a success is not necessarily a “good” result; it is simply a label for one of the two possible outcomes on a single trial. Second, even if a trial originally has more than two possible outcomes, we can often define success and failure in a way that reduces it to two categories.

For example, suppose the random variable counts the number of even rolls when rolling a six-sided die. Although there are six possible outcomes, we can define a success as rolling an even number (2, 4, or 6) and a failure as rolling an odd number (1, 3, or 5). In this way, each trial results in either a success or a failure.

Binomial Distribution

The binomial random variable, $Y$ , represents the number of successes in n trials in a binomial setting. The probability distribution of $Y$ is called the binomial distribution with parameters n and p.

How can we get the probability distribution of $Y$ ? We will use a simple example to help answer this question. Suppose a basketball player who makes 70% of free throws attempts 3 shots. The random variable $Y$ , the number of made shots, is a binomial random variable with n = 3 trials, p = 0.7, and q = 0.3. In this example, making a shot is a success (S) and missing a shot is a failure (F).

The following outcomes represent all possible sequences of makes (S) and misses (F):

\text{SSS} \quad \text{SSF} \quad \text{SFS} \quad \text{SFF} \quad \text{FSS} \quad \text{FSF} \quad \text{FFS} \quad \text{FFF}

When shooting three free throws, the possible number of makes is 0, 1, 2, or 3. Therefore, the possible values of our random variable $Y$ is 0, 1, 2, or 3.

What is the probability that the basketball player makes exactly 1 shot? We will start by finding the probability of a single specific outcome with 1 made shot. For example, let’s consider the outcome SFF. Since the shots are assumed independent,

\begin{align*} P(SFF) = P(S) \cdot P(F) \cdot P(F) &= P(S) \cdot P(F)^2 \\ &= (0.7) \cdot (0.3)^2 \end{align*}

Notice that any arrangements containing exactly one success and two failures has the same probability. There are 3 independent arrangements (SFF, FSF, FFS), so the overall probability of $Y = 1$ is the sum of the probabilities of these three arrangements:

\begin{align*} P(Y = 1) &= P(SFF) + P(FSF) + P(FFS) \\ &= 3 \cdot (0.7) \cdot (0.3)^2 \\ &= 0.189 \end{align*}

Similarly, we can find the probabilities of $Y = 0$ , $Y = 2$ , and $Y = 3$ to get the full probability distribution of $Y$ :

Binomial Probability Distribution

Number of Made Shots ( $Y$ )	Arrangements	Probability
0	FFF	1 $\cdot$ (0.3) $^3$ = 0.027
1	SFF, FSF, FFS	3 $\cdot$ (0.7) $\cdot$ (0.3) $^2$ = 0.189
2	SSF, SFS, FSS	3 $\cdot$ (0.7) $^2$ $\cdot$ (0.3) = 0.441
3	SSS	1 $\cdot$ (0.7) $^3$ = 0.343

In most situations it can be difficult to list and keep track of all the possible arrangements. We can use the binomial coefficient to calculate the number of arrangements of k successes in n trials.

Binomial Coefficient

\binom{n}{k} = \frac{n!}{k!(n-k)!}

for $k = 0,1,2,\ldots,n$ where

n! = (n) \cdot (n−1) \cdot (n−2) \cdot . . . \cdot (3) \cdot (2) \cdot (1)

and $0! = 1$ .

We can now generalize our example to get the formula. The count of 3 arrangements becomes the binomial coefficient, the (0.7) term become p, and the (0.3) term becomes q = (1-p). This result is the binomial probability formula.

Binomial Probability Formula

$P(Y = k) = \binom{n}{k} \cdot p^k \cdot (1-p)^{n-k}$

k = Number of successes

n = Number of independent trials

p = Probability of success for each trial

In summary, we can find the probability of exactly k successes in n trials by multiplying the number of ways to arrange k successes and n − k failures, the probability of k successes, and the probability of n − k failures.

Mean and Standard Deviation

Using what we learned about combining random variables, we can find the mean and standard deviation of a binomial random variable by first looking at a single trial. Let’s consider a random variable $X$ that is equal to 1 if the trial is a success with probability p, and 0 if the trial is a failure with probability q = (1-p).

Then the mean of $X$ is

\begin{align*} \mu_X = \sum{x_ip_i} &= (1)(p) + (0)(1-p) \\ &= p \end{align*}

and the variance of $X$ is

\begin{align*} \sigma_X^2 = \sum{(x_i - \mu_X)^2 p_i} &= (1 - p)^2 \cdot p + (0 - p)^2 \cdot (1 - p) \\ &= p(1-p) \end{align*}

Remember that a binomial random variable is a sequence of independent trials with binary outcomes and the same probability of success. If we consider $Y$ = $X_1 + X_2 + \cdots + X_n$ , where each $X_i$ is independent, then $Y$ is a binomial random variable.

Applying the rules for combining independent random variables, the mean of a binomial random variable is

\begin{align*} \mu_Y &= \mu_{X_1} + \mu_{X_2} + \cdots + \mu_{X_n} \\ &= p + p + \cdots + p \\ &= np \end{align*}

and the variance is

\begin{align*} \sigma_Y^2 &= \sigma_{X_1}^2 + \sigma_{X_2}^2 + \cdots + \sigma_{X_n}^2 \\ &= p(1-p) + p(1-p) + \cdots + p(1-p) \\ &= np(1-p) \end{align*}

Finally the standard deviation is simply the square root of the variance

\begin{align*} \sigma_Y &= \sqrt{\sigma_Y^2} \\ &= \sqrt{np(1-p)} \end{align*}

Mean and Standard Deviation Formulas

$\mu_Y = np$

$\sigma_Y = \sqrt{np(1-p)}$

n = Number of independent trials

p = Probability of success for each trial

The structure of these formulas reflects the underlying logic of the binomial distribution. If each trial has a constant probability of success p, for example 70%, then we would expect 70% of the trials to be successes.

Notice that as n increases the mean and standard deviation increase. With more trials, we should expect both a larger number of successes on average and a greater spread in the number of successes. When p = 0 or 1, we know the result of each trial (guaranteed success or failure), so standard deviation is 0 and the mean is either 0 or n. Finally, when p = 0.5, the variance is maximized since at this p we are most uncertain about the outcomes.

Definitions

Binomial Setting is a situation where the following properties hold true
1. Fixed number of identical trials denoted as n.
2. Each trial results in one of two outcomes: success (S) or failure (F)
3. The probability of success for a single trial remains the same for each trial, denoted by p. The probability of failure is q = (1-p).
4. The trials are independent.
Binomial Random Variable is a random variable that counts the number of successes in n independent trials in a binomial setting.
Binomial Distribution is the probability distribution of a binomial random variable, describing the probabilities of obtaining 0, 1, 2, …, n successes.
Binomial Coefficient is the number of ways to choose or arrange k successes among n trials, given by

$\binom{n}{k} = \frac{n!}{k!(n - k)!}$
Binomial Probability Formula gives the probability of observing exactly k successes in n trials, and is calculated as

$P(Y = k) = \binom{n}{k} \cdot p^k \cdot (1-p)^{n-k}$
Binomial Random Variable Mean can be calculated by the equation

$\mu_Y = np$
Binomial Random Variable Standard Deviation can be calculated by the equation

$\sigma_Y = \sqrt{np(1-p)}$

Practice Problems

In a certain state, approximately 85% of adults with a valid driver’s license own a smartphone. Suppose 100 licensed drivers are randomly selected. For the random variables below, determine whether it is a binomial random variable. If so, give the values of n and p.

a. The number of drivers in the sample who own a smartphone
Solution
Recall that a binomial random variable counts the number of successes in n independent trials in a binomial setting. We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 100 drivers are selected, so there are 100 trials.
2. Two outcomes per trial — We can define success as owning a smartphone and failure as not owning one.
3. Constant probability of success — Each driver has the same probability of owning a smartphone, which is 0.85.
4. Independent trials — Drivers are randomly selected, so the trials are independent.
The properties of the binomial setting are satisfied. Next we want to make sure we are being asked to count the number of successes in a certain number of trials. Indeed, the random variable counts the number of drivers in the sample who own a smartphone, which is the number of successes in 100 trials.
Therefore, this is a binomial random variable with parameters n = 100 and p = 0.85.
b. The ages of the 100 drivers in the sample
Solution
We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 100 drivers are selected, so there are 100 trials.
2. Two outcomes per trial — This property does not hold, since the driver’s age can be many different values and is not a binary outcome.
The properties of the binomial setting are not satisfied. Even though the remaining properties may hold, we can stop here.
Therefore, this is not a binomial random variable, because each trial does not have only two possible outcomes.
c. The number of drivers in the sample who are younger than 40 years old
Solution
We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 100 drivers are selected, so there are 100 trials.
2. Two outcomes per trial — We can define success as being younger than 40 years old and failure as being older than 40 years old.
3. Constant probability of success — While not given, we can assume that each driver has the same probability of being younger than 40 years old, which is the proportion of licensed drivers who are younger than 40.
4. Independent trials — Drivers are randomly selected, so the trials are independent.
The properties of the binomial setting are satisfied. Next we want to make sure we are being asked to count the number of successes in a certain number of trials. Indeed, the random variable counts the number of drivers in the sample under the age of 40, which is the number of successes in 100 trials.
Therefore, this is a binomial random variable with parameters n = 100 and a unknown probability of success p.
d. The number of miles each driver traveled during the past year
Solution
We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 100 drivers are selected, so there are 100 trials.
2. Two outcomes per trial — This property does not hold, since the number of miles traveled can be many different values and is not a binary outcome.
The properties of the binomial setting are not satisfied. Even though the remaining properties may hold, we can stop here.
Therefore, this is not a binomial random variable, because each trial does not have only two possible outcomes.
e. The number of drivers in the sample who own exactly two vehicle
Solution
We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 100 drivers are selected, so there are 100 trials.
2. Two outcomes per trial — We can define success as having exactly two vehicles and failure as having a different number of vehicles.
3. Constant probability of success — While not given, we can assume that each driver has the same probability of having exactly two vehicles, which is the proportion of licensed drivers who own exactly two vehicles.
4. Independent trials — Drivers are randomly selected, so the trials are independent.
The properties of the binomial setting are satisfied. Next we want to make sure we are being asked to count the number of successes in a certain number of trials. Indeed, the random variable counts the number of drivers in the sample who have exactly two vehicles, which is the number of successes in 100 trials.
Therefore, this is a binomial random variable with parameters n = 100 and a unknown probability of success p.

A chocolate bar factory purchases a new machine to improve production. Quality control engineers find that the machine produces defective chocolate bars at a rate of 5%. To test if the machine is working correctly, the factory independently produces a sample of 200 chocolate bars through the machine.

Let $Y$ = the number of defective chocolate bars produced.

a. Check that $Y$ can be modeled as a binomial random variable. Identify parameters n and p.
Solution
Recall that a binomial random variable counts the number of successes in n independent trials in a binomial setting. We start by checking each property of a binomial setting:
1. Fixed number of identical trials — 200 chocolate bars are produced, so there are 200 trials.
2. Two outcomes per trial — We can define success as a defective chocolate bar and failure as a non-defective chocolate bar.
3. Constant probability of success — Each chocolate bar has the same probability of being defective, which is 5% or 0.05.
4. Independent trials — The production of each chocolate bar is independent of the others.
The properties of the binomial setting are satisfied. Next we want to make sure we are being asked to count the number of successes in a certain number of trials. The random variable counts the number of defective chocolate bars produced, which is the number of successes in 200 trials.
Therefore, this is a binomial random variable with parameters n = 200 and p = 0.05.
b. Calculate the mean $\mu_Y$ and standard deviation $\sigma_Y$ of $Y$ . Interpret each value in context.

Solution
The mean is given by
$\mu_Y = np = (200)(0.05) = 10$
This means that, on average, the machine will produce 10 defective chocolate bars out of 200.
The standard deviation is given by
$\sigma_Y = \sqrt{np(1-p)} = \sqrt{(200)(0.05)(0.95)} = \sqrt{9.5} \approx 3.08$
This means that the number of defective chocolate bars produced will typically vary by about 3.08 from the mean of 10.

c. The factory determines that the machine is underperforming if the number of defective bars is more than one standard deviation above the mean. Find the number of defective bars that at which the machine is considered underperforming.

Solution
$\mu_Y + \sigma_Y = 10 + 3.08 = 13.08$
In this context, the number of defective bars must be a whole number, so we round up to 14. Therefore, if the machine produces more than 14 defective chocolate bars out of 200, it is considered underperforming.

Alyssa is taking a multiple-choice practice exam with 10 questions, each having 4 answer choices. She guesses on all questions.

Assume this is a binomial setting. Let $Y$ = the number of correct answers.

a. Calculate the mean $\mu_X$ and standard deviation $\sigma_X$ of $Y$ . Interpret each value in context.

Solution
The mean is given by
$\mu_X = np = (10)(0.25) = 2.5$
This means that, on average, Alyssa will get 2.5 questions correct out of 10.
The standard deviation is given by
$\sigma_X = \sqrt{np(1-p)} = \sqrt{(10)(0.25)(0.75)} = \sqrt{1.875} \approx 1.37$
This means that the number of correct answers will typically vary by about 1.37 from the mean of 2.5.

b. What is the probability that she gets exactly 3 questions correct?

Solution
We start by identifying the parameters of the binomial distribution. Since there are 10 questions, n = 10. Each question has 4 answer choices, so the probability of guessing correctly is p = 1/4 = 0.25.
We are asked to find $P(Y = 3)$ . Using the binomial probability formula:
$\begin{align*} P(Y = 3) &= \binom{10}{3} (0.25)^3 (0.75)^7 \\ &=(12) (0.25)^3 (0.75)^7 \\ &\approx 0.2503 \end{align*}$
The probability that Alyssa gets exactly 3 questions correct is approximately 0.2503 or25.03%.

c. What is the probability that Alyssa gets between 2 and 5 questions correct?

Solution
We are asked to find $P(2 \leq Y \leq 5)$ . This is the sum of the probabilities of getting exactly 2, 3, 4, or 5 questions correct.
Using the binomial probability formula:
$P(2 \leq Y \leq 5) = P(Y = 2) + P(Y = 3) + P(Y = 4) + P(Y = 5)$
Calculating each term:
$P(Y = 2) = \binom{10}{2} (0.25)^2 (0.75)^8 \approx 0.2816$ $P(Y = 3) = \binom{10}{3} (0.25)^3 (0.75)^7 \approx 0.2503$ $P(Y = 4) = \binom{10}{4} (0.25)^4 (0.75)^6 \approx 0.1460$ $P(Y = 5) = \binom{10}{5} (0.25)^5 (0.75)^5 \approx 0.0584$
Adding these probabilities:
$P(2 \leq Y \leq 5) \approx 0.2816 + 0.2503 + 0.1460 + 0.0584 = 0.7363$
Therefore, the probability that Alyssa gets between 2 and 5 questions correct is approximately 0.7363 or 73.63%.

d. What is the the probability that Alyssa gets at least two questions correct?

Solution
We are asked to find $P(Y \geq 2)$ . We could individually calculate the probability of getting 2, 3, 4, 5, 6, 7, 8, 9, 10 questions correct and sum them up. This would be more tedious, we instead note that the sum of all individual probabilities is equal to 1, so we can instead calculate the probability of getting 0 or 1 questions correct and subtract from 1.
$P(Y = 0) = \binom{10}{0} (0.25)^0 (0.75)^{10} \approx 0.0563$ $P(Y = 1) = \binom{10}{1} (0.25)^1 (0.75)^9 \approx 0.1877$
Then
$\begin{align*} P(Y \geq 2) &= 1 - P(Y = 0) - P(Y = 1) \\ &= 1 - 0.0563 - 0.1877 = 0.7560 \end{align*}$
Therefore, the probability that Alyssa gets at least two questions correct is approximately 0.7560 or 75.60%.