4.5: The Binomial Probability Distribution
Learning Objectives
- Know the characteristics of a binomial random variable.
- Understand a binomial probability distribution.
- Know the definitions of the mean, the variance, and the standard deviation of a binomial random variable.
- Identify the type of statistical situation to which a binomial distribution can be applied.
- Use a binomial distribution to solve statistical problems.
Introduction
Many experiments result in responses for which there are only two possible outcomes, such as either 'yes' or 'no', 'pass' or 'fail', 'good' or 'defective', 'male' or 'female', etc. A simple example is the toss of a coin. Say, for example, that we toss the coin five times. In each toss, we will observe either a head, \begin{align*}H\end{align*}, or a tail, \begin{align*}T\end{align*}. We might be interested in the probability distribution of \begin{align*}X\end{align*}, the number of heads observed. In this case, the possible values of \begin{align*}X\end{align*} range from 0 to 5. It is scenarios like this that we will examine in this lesson
Binomial Experiments
Example: Suppose we select 100 students from a large university campus and ask them whether they are in favor of a certain issue that is being debated on their campus. The students are to answer with either a 'yes' or a 'no'. Here, we are interested in \begin{align*}X\end{align*}, the number of students who favor the issue (a 'yes'). If each student is randomly selected from the total population of the university, and the proportion of students who favor the issue is \begin{align*}p\end{align*}, then the probability that any randomly selected student favors the issue is \begin{align*}p\end{align*}. The probability of a selected student who does not favor the issue is \begin{align*}1 - p\end{align*}. Sampling 100 students in this way is equivalent to tossing a coin 100 times. This experiment is an example of a binomial experiment.
Characteristics of a Binomial Experiment
- The experiment consists of \begin{align*}n\end{align*} independent, identical trials.
- There are only two possible outcomes on each trial: \begin{align*}S\end{align*} (for success) or \begin{align*}F\end{align*} (for failure).
- The probability of \begin{align*}S\end{align*} remains constant from trial to trial. We will denote it by \begin{align*}p\end{align*}. We will denote the probability of \begin{align*}F\end{align*} by \begin{align*}q\end{align*}. Thus, \begin{align*}q = 1 - p\end{align*}.
- The binomial random variable \begin{align*}X\end{align*} is the number of successes in \begin{align*}n\end{align*} trials.
Example: In the following two examples, decide whether \begin{align*}X\end{align*} is a binomial random variable.
Suppose a university decides to give two scholarships to two students. The pool of applicants is ten students: six males and four females. All ten of the applicants are equally qualified, and the university decides to randomly select two. Let \begin{align*}X\end{align*} be the number of female students who receive the scholarship.
If the first student selected is a female, then the probability that the second student is a female is \begin{align*}\frac{3}{9}\end{align*}. Here we have a conditional probability: the success of choosing a female student on the second trial depends on the outcome of the first trial. Therefore, the trials are not independent, and \begin{align*}X\end{align*} is not a binomial random variable.
A company decides to conduct a survey of customers to see if its new product, a new brand of shampoo, will sell well. The company chooses 100 randomly selected customers and asks them to state their preference among the new shampoo and two other leading shampoos on the market. Let \begin{align*}X\end{align*} be the number of the 100 customers who choose the new brand over the other two.
In this experiment, each customer either states a preference for the new shampoo or does not. The customers’ preferences are independent of each other, and therefore, \begin{align*}X\end{align*} is a binomial random variable.
Let’s examine an actual binomial situation. Suppose we present four people with two cups of coffee (one percolated and one instant) to discover the answer to this question: “If we ask four people which is percolated coffee and none of them can tell the percolated coffee from the instant coffee, what is the probability that two of the four will guess correctly?” We will present each of four people with percolated and instant coffee and ask them to identify the percolated coffee. The outcomes will be recorded by using \begin{align*}C\end{align*} for correctly identifying the percolated coffee and \begin{align*}I\end{align*} for incorrectly identifying it. A list of the 16 possible outcomes, all of which are equally likely if none of the four can tell the difference and are merely guessing, is shown below:
Number Who Correctly Identify Percolated Coffee | Outcomes, \begin{align*}C\end{align*} (correct), \begin{align*}I\end{align*} (incorrect) | Number of Outcomes |
---|---|---|
0 | \begin{align*}IIII\end{align*} | 1 |
1 | \begin{align*}ICII \ IIIC \ IICI \ CIII\end{align*} | 4 |
2 | \begin{align*}ICCI \ IICC \ ICIC \ CIIC \ CICI \ CCII\end{align*} | 6 |
3 | \begin{align*}CICC \ ICCC \ CCCI \ CCIC\end{align*} | 4 |
4 | \begin{align*}CCCC\end{align*} | 1 |
Using the Multiplication Rule for Independent Events, you know that the probability of getting a certain outcome when two people guess correctly, such as \begin{align*}CICI\end{align*}, is \begin{align*}(\frac{1}{2}) \left(\frac{1}{2}\right) \left(\frac{1}{2}\right) \left(\frac{1}{2}\right)=\left(\frac{1}{16}\right)\end{align*}. The table shows six outcomes where two people guessed correctly, so the probability of getting two people who correctly identified the percolated coffee is \begin{align*}\frac{6}{16}\end{align*}. Another way to determine the number of ways that exactly two people out of four people can identify the percolated coffee is simply to count how many ways two people can be selected from four people:
\begin{align*}_4C_2 = \frac{4 !}{2!2!}=\frac{24}{4}=6\end{align*}
In addition, a graphing calculator can also be used to calculate binomial probabilities.
By pressing [2ND][DISTR], you can enter 'binompdf (4,0.5,2)'. This command calculates the binomial probability for \begin{align*}k\end{align*} (in this example, \begin{align*}k = 2\end{align*}) successes out of \begin{align*}n\end{align*} (in this example, \begin{align*}n = 4\end{align*}) trials, when the probability of success on any one trial is \begin{align*}p\end{align*} (in this example, \begin{align*}p = 0.5\end{align*}).
A binomial experiment is a probability experiment that satisfies the following conditions:
- Each trial can have only two outcomes\begin{align*}-\end{align*}one known as a success, and the other known as a failure.
- There must be a fixed number, \begin{align*}n\end{align*}, of trials.
- The outcomes of the trials must be independent of each other. The probability of each success doesn’t change, regardless of what occurred previously.
- The probability, \begin{align*}p\end{align*}, of a success must remain the same for each trial.
The distribution of the random variable \begin{align*}X\end{align*}, where \begin{align*}x\end{align*} is the number of successes, is called a binomial probability distribution. The probability that you get exactly \begin{align*}x = k\end{align*} successes is as follows:
\begin{align*}P(x=k) = \binom{n}{k} p^k (1-p)^{n-k}\end{align*}
where:
\begin{align*}\binom{n}{k} = \frac{n!}{k!(n-k)!}\end{align*}
Let’s return to the coffee experiment and look at the distribution of \begin{align*}X\end{align*} (correct guesses):
\begin{align*}k\end{align*} | \begin{align*}P(x=k)\end{align*} |
---|---|
0 | \begin{align*}\frac{1}{16}\end{align*} |
1 | \begin{align*}\frac{4}{16}\end{align*} |
2 | \begin{align*}\frac{6}{16}\end{align*} |
3 | \begin{align*}\frac{4}{16}\end{align*} |
4 | \begin{align*}\frac{1}{16}\end{align*} |
The expected value for the above distribution can be calculated as follows:
\begin{align*}E(x) &= (0)\left(\frac{1}{16}\right) + (1)\left(\frac{4}{16}\right)+(2)\left(\frac{6}{16}\right) +(3)\left(\frac{4}{16}\right) + (4)\left(\frac{1}{16}\right)\\ E(x) &= 2\end{align*}
In other words, you would expect half of the four to guess correctly when given two equally-likely choices. \begin{align*}E(x)\end{align*} can be written as \begin{align*}(4)\left(\frac{1}{2}\right)\end{align*}, which is equivalent to \begin{align*}np\end{align*}.
For a random variable \begin{align*}X\end{align*} having a binomial distribution with \begin{align*}n\end{align*} trials and a probability of success of \begin{align*}p\end{align*}, the expected value (mean) and standard deviation for the distribution can be determined by the following formulas:
\begin{align*}E(x)=\mu_X=np\end{align*} and \begin{align*}\sigma_X=\sqrt{np(1-p)}\end{align*}
To apply the binomial formula to a specific problem, it is useful to have an organized strategy. Such a strategy is presented in the following steps:
- Identify a success.
- Determine \begin{align*}p\end{align*}, the probability of success.
- Determine \begin{align*}n\end{align*}, the number of experiments or trials.
- Use the binomial formula to write the probability distribution of \begin{align*}X\end{align*}.
Example: According to a study conducted by a telephone company, the probability is 25% that a randomly selected phone call will last longer than the mean value of 3.8 minutes. What is the probability that out of three randomly selected calls:
a. Exactly two last longer than 3.8 minutes?
b. None last longer than 3.8 minutes?
Using the first three steps listed above:
- A success is any call that is longer than 3.8 minutes.
- The probability of success is \begin{align*}p = 0.25\end{align*}.
- The number of trials is \begin{align*}n = 3\end{align*}.
Thus, we can now use the binomial probability formula:
\begin{align*}p(x)=\binom{n}{x} p^x (1-p)^{n-x}\end{align*}
Substituting, we have: \begin{align*}p(x)=\binom{3}{x} (0.25)^x (1-0.25)^{3-x}\end{align*}
a. For \begin{align*}x = 2\end{align*}:
\begin{align*}p(x)&=\binom{3}{2} (0.25)^2 (1-0.25)^{3-2}\\ &= (3)(0.25)^2 (1-0.25)^1\\ &= 0.14\end{align*}
Thus, the probability is 0.14 that exactly two out of three randomly selected calls will last longer than 3.8 minutes.
b. Here, \begin{align*}x = 0\end{align*}. Again, we use the binomial probability formula:
\begin{align*}p(x=0) &= \binom{3}{0} (0.25)^0 (1-0.25)^{3-0}\\ &= \frac{3!}{0!(3-0)!}(0.25)^0(0.75)^3\\ &= 0.422\end{align*}
Thus, the probability is 0.422 that none of the three randomly selected calls will last longer than 3.8 minutes.
Example: A car dealer knows from past experience that he can make a sale to 20% of the customers who he interacts with. What is the probability that, in five randomly selected interactions, he will make a sale to:
a. Exactly three customers?
b. At most one customer?
c. At least one customer?
Also, determine the probability distribution for the number of sales.
A success here is making a sale to a customer. The probability that the car dealer makes a sale to any customer is \begin{align*}p = 0.20\end{align*}, and the number of trials is \begin{align*}n = 5\end{align*}. Therefore, the binomial probability formula for this case is:
\begin{align*}p(x)=\binom{5}{x} (0.2)^x(0.8)^{5-x}\end{align*}
a. Here we want the probability of exactly 3 sales, so \begin{align*}x =3\end{align*}.
\begin{align*}p(x)=\binom{5}{3} (0.2)^3 (0.8)^{5-3}=0.051\end{align*}
This means that the probability that the car dealer makes exactly three sales in five attempts is 0.051.
b. The probability that the car dealer makes a sale to at most one customer can be calculated as follows:
\begin{align*}p(x \le 1) &= p(0)+p(1)\\ &= \binom{5}{0} (0.2)^0 (0.8)^{5-0} + \binom{5}{1} (0.2)^1 (0.8)^{5-1}\\ &= 0.328 + 0.410 = 0.738\end{align*}
c. The probability that the car dealer makes at least one sale is the sum of the probabilities of him making 1, 2, 3, 4, or 5 sales, as is shown below:
\begin{align*}p(x \ge 1)=p(1)+p(2)+p(3)+p(4)+p(5)\end{align*}
We can now apply the binomial probability formula to calculate the five probabilities. However, we can save time by calculating the complement of the probability we're looking for and subtracting it from 1 as follows:
\begin{align*}p(x \ge 1) &= 1-p(x < 1) = 1-p(x = 0)\\ 1- p(0) &= 1- \binom{5}{0} (0.2)^0 (0.8)^{5-0}\\ &= 1-0.328 = 0.672\end{align*}
This tells us that the salesperson has a probability of 0.672 of making at least one sale in five attempts.
We are also asked to determine the probability distribution for the number of sales, \begin{align*}X\end{align*}, in five attempts. Therefore, we need to compute \begin{align*}p(x)\end{align*} for \begin{align*}x = 1, 2, 3, 4\end{align*}, and 5. We can use the binomial probability formula for each value of \begin{align*}X\end{align*}. The table below shows the probabilities.
\begin{align*}x\end{align*} | \begin{align*}p(x)\end{align*} |
---|---|
0 | 0.328 |
1 | 0.410 |
2 | 0.205 |
3 | 0.051 |
4 | 0.006 |
5 | 0.00032 |
Figure: The probability distribution for the number of sales.
Example: A poll of twenty voters is taken to determine the number in favor of a certain candidate for mayor. Suppose that 60% of all the city’s voters favor this candidate.
a. Find the mean and the standard deviation of \begin{align*}X\end{align*}.
b. Find the probability of \begin{align*}x \le 10\end{align*}.
c. Find the probability of \begin{align*}x>12\end{align*}.
d. Find the probability of \begin{align*}x=11\end{align*}.
a. Since the sample of twenty was randomly selected, it is likely that \begin{align*}X\end{align*} is a binomial random variable. Of course, \begin{align*}X\end{align*} here would be the number of the twenty who favor the candidate. The probability of success is 0.60, the percentage of the total voters who favor the candidate. Therefore, the mean and the standard deviation can be calculated as shown:
\begin{align*}\mu &= np = (20)(0.6)=12\\ \sigma^2 &= np (1-p)=(20)(0.6)(0.4)=4.8\\ \sigma &=\sqrt{4.8}=2.2\end{align*}
b. To calculate the probability that 10 or fewer of the voters favor the candidate, it's possible to add the probabilities that 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 of the voters favor the candidate as follows:
\begin{align*}p(x \le 10) = p(0)+p(1)+p(2)+ \ldots + p(10)\end{align*}
or
\begin{align*}p(x \le 10) = \sum^{10}_{x=0} p(x) = \sum^{10}_{x=0} \binom{20}{x} (0.6)^x (0.4)^{20-x}\end{align*}
As you can see, this would be a very tedious calculation, and it is best to resort to your calculator. See the technology note at the end of the section for more information.
c. To find the probability that \begin{align*}x > 12\end{align*}, it's possible to add the probabilities that 13, 14, 15, 16, 17, 18, 19, or 20 of the voters favor the candidate as shown:
\begin{align*}p(x > 12)=p(13)+p(14)+ \ldots + p(20)=\sum^{20}_{x=13} p(x)\end{align*}
Alternatively, using the Complement Rule, \begin{align*}p(x > 12)=1-p(x \le 12)\end{align*}.
Using a calculator (see the technology note below) with \begin{align*}n = 20, p = 0.6,\end{align*} and \begin{align*}k = 12\end{align*}, we get a probability of 0.584 that \begin{align*}x \le 12\end{align*}. Thus, \begin{align*}p(x > 12) = 1 - 0.584 = 0.416.\end{align*}
d. To find the probability that exactly 11 voters favor the candidate, it's possible to subtract the probability that less than or equal to 10 voters favor the candidate from the probability that less than or equal to 11 voters favor the candidate. These probabilities can be found using a calculator. Thus, the probability that exactly 11 voters favor the candidate can be calculated as follows:
\begin{align*}p(x=11)=p(x \le 11) - p(x \le 10)=0.404 - 0.245=0.159\end{align*}
A graphing calculator will now be used to graph and compare different versions of a binomial distribution. Each binomial distribution will be entered into two lists and then displayed as a histogram. First, we will use the calculator to generate a sequence of integers, and next, we will use it to generate a corresponding list of binomial probabilities.
To generate a sequence of integers, press [2ND][LIST], go to OPS, select '5:seq', enter '(X, X, 0, \begin{align*}n\end{align*}, 1)', where \begin{align*}n\end{align*} is the number of independent binomial trials, and press [STO][2ND][L1].
To enter the binomial probabilities associated with this sequence of integers, press [STAT] and select '1:EDIT'.
Clear out L2 and position the cursor on the L2 list name.
Press [2ND][DISTR] to bring up the list of distributions.
Select 'A:binompdf(' and enter '\begin{align*}n, p)\end{align*}', where \begin{align*}n\end{align*} is the number of independent binomial trials and \begin{align*}p\end{align*} is the probability of success.
To graph the histogram, make sure your window is set correctly, press [2ND][STAT PLOT], turn a plot on, select the histogram plot, choose L1 for Xlist and L2 for Freq, and press [GRAPH]. This will display the binomial histogram.
Horizontally, the following are examples of binomial distributions where \begin{align*}n\end{align*} increases and \begin{align*}p\end{align*} remains constant. Vertically, the examples display the results where \begin{align*}n\end{align*} remains fixed and \begin{align*}p\end{align*} increases.
\begin{align*}n=5 \ \text{and} \ p=0.1 && n=10 \ \text{and} \ p=0.1 && n=20 \ \text{and} \ p=0.1\end{align*}
For a small value of \begin{align*}p\end{align*}, the binomial distributions are skewed toward the higher values of \begin{align*}X\end{align*}. As \begin{align*}n\end{align*} increases, the skewness decreases and the distributions gradually move toward being more normal.
\begin{align*}n=5 \ \text{and} \ p=0.5 && n=10 \ \text{and} \ p=0.5 && n=20 \ \text{and} \ p=0.5\end{align*}
As \begin{align*}p\end{align*} increases to 0.5, the skewness disappears and the distributions achieve perfect symmetry. The symmetrical, mound-shaped distribution remains the same for all values of \begin{align*}n\end{align*}.
\begin{align*}n=5 \ \text{and} \ p=0.75 && n=10 \ \text{and} \ p=0.75 && n=20 \ \text{and} \ p=0.75\end{align*}
For a larger value of \begin{align*}p\end{align*}, the binomial distributions are skewed toward the lower values of \begin{align*}X\end{align*}. As \begin{align*}n\end{align*} increases, the skewness decreases and the distributions gradually move toward being more normal.
Because \begin{align*}E(x)=np=\mu_X\end{align*}, the expected value increases with both \begin{align*}n\end{align*} and \begin{align*}p\end{align*}. As \begin{align*}n\end{align*} increases, so does the standard deviation, but for a fixed value of \begin{align*}n\end{align*}, the standard deviation is largest around \begin{align*}p = 0.5\end{align*} and reduces as \begin{align*}p\end{align*} approaches 0 or 1.
Technology Note: Calculating Binomial Probabilities on the TI-83/84 Calculator
Press [2ND][DIST] and scroll down to 'A:binompdf('. Press [ENTER] to place 'binompdf(' on your home screen. Type values of \begin{align*}n, p\end{align*}, and \begin{align*}k\end{align*}, separated by commas, and press [ENTER].
Use the 'binomcdf(' command to calculate the probability of at most \begin{align*}x\end{align*} successes. The format is 'binomcdf\begin{align*}(n, p, k)\end{align*}' to find the probability that \begin{align*}x \le k\end{align*}. (Note: It is not necessary to close the parentheses.)
Technology Note: Using Excel
In a cell, enter the function =binomdist(\begin{align*}x,n,p\end{align*},false). Press [ENTER], and the probability of \begin{align*}x\end{align*} successes will appear in the cell.
For the probability of at least \begin{align*}x\end{align*} successes, replace 'false' with 'true'.
Lesson Summary
Characteristics of a Binomial Experiment:
- A binomial experiment consists of \begin{align*}n\end{align*} identical trials.
- There are only two possible outcomes on each trial: \begin{align*}S\end{align*} (for success) or \begin{align*}F\end{align*} (for failure).
- The probability of \begin{align*}S\end{align*} remains constant from trial to trial. We denote it by \begin{align*}p\end{align*}. We denote the probability of \begin{align*}F\end{align*} by \begin{align*}q\end{align*}. Thus, \begin{align*}q = 1 - p\end{align*}.
- The trials are independent of each other.
- The binomial random variable \begin{align*}X\end{align*} is the number of successes in \begin{align*}n\end{align*} trials.
The binomial probability distribution is: \begin{align*}p(x)=\binom{n}{x} p^x (1-p)^{n-x}=\binom{n}{x} p^x q^{n-x}\end{align*}.
For a binomial random variable, the mean is \begin{align*}\mu = np\end{align*}.
The variance is \begin{align*}\sigma^2 = npq = np(1-p)\end{align*}.
The standard deviation is \begin{align*}\sigma=\sqrt{npq}=\sqrt{np(1-p)}\end{align*}.
On the Web
http://tinyurl.com/268m56r Simulation of a binomial experiment. Explore what happens as you increase the number of trials.
http://tinyurl.com/299hsjo Explore a binomial distribution as you change \begin{align*}n\end{align*} and \begin{align*}p\end{align*}.
Multimedia Links
For an explanation of binomial distribution and notation used for it (4.0)(7.0), see ExamSolutions, A-Level Statistics: Binomial Distribution (Introduction) (10:30).
For an explanation on using tree diagrams and the formula for finding binomial probabilities (4.0)(7.0), see ExamSolutions, A-Level Statistics: Binomial Distribution (Formula) (14:19).
For an explanation of using the binomial probability distribution to find probabilities (4.0), see patrickJMT, The Binomial Distribution and Binomial Probability Function (6:45).
Review Questions
- Suppose \begin{align*}X\end{align*} is a binomial random variable with \begin{align*}n = 4\end{align*} and \begin{align*}p = 0.2\end{align*}. Calculate \begin{align*}p(x)\end{align*} for each of the following values of \begin{align*}X\end{align*}: \begin{align*}0, 1, 2, 3, 4\end{align*}. Give the probability distribution in tabular form.
- Suppose \begin{align*}X\end{align*} is a binomial random variable with \begin{align*}n = 5\end{align*} and \begin{align*}p = 0.2\end{align*}. Display \begin{align*}p(x)\end{align*} in tabular form. Compute the mean and the variance of \begin{align*}X\end{align*}.
- Over the years, a medical researcher has found that one out of every ten diabetic patients receiving insulin develops antibodies against the hormone, thus, requiring a more costly form of medication.
- Find the probability that in the next five patients the researcher treats, none will develop antibodies against insulin.
- Find the probability that at least one will develop antibodies.
- According to the Canadian census of 2006, the median annual family income for families in Nova Scotia is $56,400. [Source: Stats Canada. www.statcan.ca ] Consider a random sample of 24 Nova Scotia households.
- What is the expected number of households with annual incomes less than $56,400?
- What is the standard deviation of households with incomes less than $56,400?
- What is the probability of getting at least 18 out of the 24 households with annual incomes under $56,400?