A probability distribution is the set of values that a random variable can take on. At this time, there are three ways that you can create probability distributions from data. Sometimes previously collected data, relative to the random variable that you are studying, can help to create a probability distribution. In addition to this method, a simulation is also a good way to create an approximate probability distribution. A probability distribution can also be constructed from the basic principles, assumptions, and rules of theoretical probability. The examples in this lesson will lead you to a better understanding of these rules of theoretical probability.
Rules of Theorectical Probability
Sums and Differences of Independent Random Variables
1. Create a table that shows all the possible outcomes when two dice are rolled simultaneously. (Hint: There are 36 possible outcomes.)
\begin{align*}2^{nd}\end{align*} | Die | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | ||
1 | 1, 1 | 1, 2 | 1, 3 | 1, 4 | 1, 5 | 1, 6 | |
2 | 2, 1 | 2, 2 | 2, 3 | 2, 4 | 2, 5 | 2, 6 | |
3 | 3, 1 | 3, 2 | 3, 3 | 3, 4 | 3, 5 | 3, 6 | \begin{align*}1^{st}\end{align*} Die |
4 | 4, 1 | 4, 2 | 4, 3 | 4, 4 | 4, 5 | 4, 6 | |
5 | 5, 1 | 5, 2 | 5, 3 | 5, 4 | 5, 5 | 5, 6 | |
6 | 6, 1 | 6, 2 | 6, 3 | 6, 4 | 6, 5 | 6, 6 |
This table of possible outcomes when two dice are rolled simultaneously that is shown above can now be used to construct various probability distributions. The first table below displays the probabilities for all the possible sums of the two dice, and the second table shows the probabilities for each of the possible results for the larger of the two numbers produced by the dice.
Sum of Two Dice, \begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
2 | \begin{align*}\frac{1}{36}\end{align*} |
3 | \begin{align*}\frac{2}{36}\end{align*} |
4 | \begin{align*}\frac{3}{36}\end{align*} |
5 | \begin{align*}\frac{4}{36}\end{align*} |
6 | \begin{align*}\frac{5}{36}\end{align*} |
7 | \begin{align*}\frac{6}{36}\end{align*} |
8 | \begin{align*}\frac{5}{36}\end{align*} |
9 | \begin{align*}\frac{4}{36}\end{align*} |
10 | \begin{align*}\frac{3}{36}\end{align*} |
11 | \begin{align*}\frac{2}{36}\end{align*} |
12 | \begin{align*}\frac{1}{36}\end{align*} |
Total | 1 |
Larger Number, \begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
1 | \begin{align*}\frac{1}{36}\end{align*} |
2 | \begin{align*}\frac{3}{36}\end{align*} |
3 | \begin{align*}\frac{5}{36}\end{align*} |
4 | \begin{align*}\frac{7}{36}\end{align*} |
5 | \begin{align*}\frac{9}{36}\end{align*} |
6 | \begin{align*}\frac{11}{36}\end{align*} |
Total | 1 |
When you roll the two dice, what is the probability that the sum is 4? By looking at the first table above, you can see that the probability is \begin{align*}\frac{3}{36}\end{align*}.
What is the probability that the larger number is 4? By looking at the second table above, you can see that the probability is \begin{align*}\frac{7}{36}\end{align*}.
2. The Regional Hospital has recently opened a new pulmonary unit and has released the following data on the proportion of silicosis cases caused by working in the coal mines. Suppose two silicosis patients are randomly selected from a large population with the disease.
Silicosis Cases | Proportion |
---|---|
Worked in the mine | 0.80 |
Did not work in the mine | 0.20 |
There are four possible outcomes for the two patients. With ‘yes’ representing “worked in the mines” and ‘no’ representing “did not work in the mines”, the possibilities are as follows:
First Patient | Second Patient | |
---|---|---|
1 | No | No |
2 | Yes | No |
3 | No | Yes |
4 | Yes | Yes |
As stated previously, the patients for this survey have been randomly selected from a large population, and therefore, the outcomes are independent. The probability for each outcome can be calculated by multiplying the appropriate proportions as shown:
\begin{align*}P(\text{no for} \ 1^{\text{st}}) \bullet P(\text{no for} \ 2^{\text{nd}}) &= (0.2)(0.2)=0.04\\ P(\text{yes for} \ 1^{\text{st}}) \bullet P(\text{no for} \ 2^{\text{nd}}) &= (0.8)(0.2)=0.16\\ P(\text{no for} \ 1^{\text{st}}) \bullet P(\text{yes for} \ 2^{\text{nd}}) &= (0.2)(0.8)=0.16\\ P(\text{yes for} \ 1^{\text{st}}) \bullet P(\text{yes for} \ 2^{\text{nd}}) &= (0.8)(0.8)=0.64\end{align*}
If \begin{align*}X\end{align*} represents the number silicosis patients who worked in the mines in this random sample, then the first of these outcomes results in \begin{align*}x = 0\end{align*}, the second and third each result in \begin{align*}x = 1\end{align*}, and the fourth results in \begin{align*}x = 2\end{align*}. Because the second and third outcomes are disjoint, their probabilities can be added. The probability distribution for \begin{align*}X\end{align*} is given in the table below:
\begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
0 | 0.04 |
1 | \begin{align*}0.16 + 0.16 = 0.32\end{align*} |
2 | 0.64 |
Calculating the Expected Value and the Variance
Suppose an individual plays a gambling game where it is possible to lose $2.00, break even, win $6.00, or win $20.00 each time he plays. The probability distribution for each outcome is provided by the following table:
Winnings, \begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
\begin{align*}-\end{align*}$2 | 0.30 |
$0 | 0.40 |
$6 | 0.20 |
$20 | 0.10 |
The table can be used to calculate the expected value and the variance of this distribution:
\begin{align*}\mu_{} &= \sum x_{}p_{}(x)\\ \mu_{} &= (-2 \cdot 0.30)+(0 \cdot 0.40)+(6 \cdot 0.20)+(20 \cdot 0.10)\\ \mu_{} &= 2.6\end{align*}
Thus, the player can expect to win $2.60 playing this game.
The variance of this distribution can be calculated as shown:
\begin{align*}\sigma{_{}}^2 &= \sum (x_{}-\mu_{})^2 p(x)\\ \sigma{_{}}^2 &= (-2-2.6)^2 (0.30)+(0-2.6)^2 (0.40)+(6-2.6)^2 (0.20)+(20-2.6)^2 (0.10)\\ \sigma{_{}}^2 & \approx 41.64\\ \sigma_{} & \approx \sqrt{41.64} \approx \$ 6.45\end{align*}
Calculating the Expected Value and the Standard Deviation
The following probability distribution was constructed from the results of a survey at the local university. The random variable is the number of fast food meals purchased by a student during the preceding year (12 months). For this distribution, calculate the expected value and the standard deviation.
Number of Meals Purchased Within 12 Months, \begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
0 | 0.04 |
\begin{align*}[1 - 6)\end{align*} | 0.30 |
\begin{align*}[6 - 11)\end{align*} | 0.29 |
\begin{align*}[11 - 21)\end{align*} | 0.17 |
\begin{align*}[21 - 51)\end{align*} | 0.15 |
\begin{align*}[51 - 60)\end{align*} | 0.05 |
Total | 1.00 |
You must begin by estimating a mean for each interval, and this can be done by finding the center of each interval. For the first interval of \begin{align*}[1 - 6)\end{align*}, 6 is not included in the interval, so a value of 3 would be the center. This same procedure can be used to estimate the mean of all the intervals. Therefore, the expected value can be calculated as follows:
\begin{align*}\mu_{} &= \sum x_{}p_{}(x)\\ \mu_{} &= (0)(0.04)+(3)(0.30)+(8)(0.29)+(15.5)(0.17)+(35.5)(0.15)+(55)(0.05)\\ \mu_{} &= 13.93\end{align*}
Likewise, the standard deviation can be calculated:
\begin{align*}\sigma^2 &= \sum (x_{}-\mu_{})^2 p_{}(x)\\ &= (0-13.93)^2 (0.04)+(3-13.93)^2 (0.30)\\ & \quad +(8-13.93)^2(0.29)+(15.5-13.93)^2(0.17)\\ & \quad +(35.5-13.93)^2(0.15)+(55-13.93)^2(0.05)\\ & \approx 208.3451\end{align*}
\begin{align*}\sigma_{} \approx 14.43\end{align*}
Thus, the expected number of fast food meals purchased by a student at the local university is 13.93, and the standard deviation is 14.43. Note that the mean should not be rounded, since it does not have to be one of the values in the distribution. You should also notice that the standard deviation is very close to the expected value. This means that the distribution will be skewed to the right and have a long tail toward the larger numbers.
Technology Note: Calculating mean and variance for probability distribution on TI-83/84 Calculator
Notice that the mean, which is denoted by \begin{align*}\overline{x}\end{align*} in this case, is 13.93, and the standard deviation, which is denoted by \begin{align*}\sigma_x\end{align*}, is approximately 14.43.
Linear Transformations of \begin{align*}X\end{align*} on Mean of \begin{align*}X\end{align*} and Standard Deviation of \begin{align*}X\end{align*}
If you add the same value to all the numbers of a data set, the shape and standard deviation of the data set remain the same, but the value is added to the mean. This is referred to as re-centering the data set. Likewise, if you rescale the data, or multiply all the data values by the same nonzero number, the basic shape will not change, but the mean and the standard deviation will each be a multiple of this number. (Note that the standard deviation must actually be multiplied by the absolute value of the number.) If you multiply the numbers of a data set by a constant \begin{align*}d\end{align*} and then add a constant \begin{align*}c\end{align*}, the mean and the standard deviation of the transformed values are expressed as follows:
\begin{align*}\mu_{c+dX} &= c+d \mu_{X}\\ \sigma_{c+dX} &= |d| \sigma_{X}\end{align*}
These are called linear transformations, and the implications of this can be better understood if you return to the casino example.
Using the Addition and Subtraction Rule
The casino has decided to triple the prizes for the game being played. What are the expected winnings for a person who plays one game? What is the standard deviation? Recall that the expected value was $2.60, and the standard deviation was $6.45.
Solution:
The simplest way to calculate the expected value of the tripled prize is (3)($2.60), or $7.80, with a standard deviation of (3)($6.45), or $19.35. Here, \begin{align*}c = 0\end{align*} and \begin{align*}d = 3\end{align*}. Another method of calculating the expected value and standard deviation would be to create a new table for the tripled prize:
Winnings, \begin{align*}x\end{align*} | Probability, \begin{align*}p\end{align*} |
---|---|
\begin{align*}-\end{align*}$6 | 0.30 |
$0 | 0.40 |
$18 | 0.20 |
$60 | 0.10 |
The calculations can be done using the formulas or by using a graphing calculator. Notice that the results are the same either way.
This same problem can be changed again in order to introduce the Addition Rule and the Subtraction Rule for random variables. Suppose the casino wants to encourage customers to play more, so it begins demanding that customers play the game in sets of three. What are the expected value (total winnings) and standard deviation now?
Let \begin{align*}X, Y\end{align*} and \begin{align*}Z\end{align*} represent the total winnings on each game played. If this is the case, then \begin{align*}\mu_{X+Y+Z}\end{align*} is the expected value of the total winnings when three games are played. The expected value of the total winnings for playing one game was $2.60, so for three games the expected value is:
\begin{align*}\mu_{X+Y+Z} &= \mu_X+\mu_Y +\mu_Z\\ \mu_{X+Y+Z} &= \$ 2.60 + \$ 2.60 + \$ 2.60\\ \mu_{X+Y+Z} &= \$ 7.80\end{align*}
Thus, the expected value is the same as that for the tripled prize.
Since the winnings on the three games played are independent, the standard deviation of \begin{align*}X, Y\end{align*} and \begin{align*}Z\end{align*} can be calculated as shown below:
\begin{align*}\sigma{^2}_{X+Y+Z} &= \sigma{^2}_X + \sigma{^2}_Y + \sigma{^2}_Z\\ \sigma{^2}_{X+Y+Z} &= 6.45^2 + 6.45^2 + 6.45^2\\ \sigma{^2}_{X+Y+Z} &\approx 124.8075\\ \sigma_{X+Y+Z} &\approx \sqrt{124.8075}\\ \sigma_{X+Y+Z} &\approx 11.17\end{align*}
This means that the person playing the three games can expect to win $7.80 with a standard deviation of $11.17. Note that when the prize was tripled, there was a greater standard deviation ($19.36) than when the person played three games ($11.17).
The Addition and Subtraction Rules for random variables are as follows:
If \begin{align*}X\end{align*} and \begin{align*}Y\end{align*} are random variables, then:
\begin{align*}\mu_{X+Y} &= \mu_X + \mu_Y\\ \mu_{X-Y} &= \mu_X - \mu_Y\end{align*}
If \begin{align*}X\end{align*} and \begin{align*}Y\end{align*} are independent, then:
\begin{align*}\sigma{^2}_{X+Y} &= \sigma{^2}_X+\sigma{^2}_Y\\ \sigma{^2}_{X-Y} &= \sigma{^2}_X+\sigma{^2}_Y\end{align*}
Variances are added for both the sum and difference of two independent random variables, because the variation in each variable contributes to the overall variation in both cases. (Subtracting is the same as adding the opposite.) Suppose you have two dice, one die, \begin{align*}X\end{align*}, with the usual positive numbers 1 through 6, and another, \begin{align*}Y\end{align*}, with the negative numbers \begin{align*}-1\end{align*} through \begin{align*}-6\end{align*}. Next, suppose you perform two experiments. In the first, you roll the first die, \begin{align*}X\end{align*}, and then the second die, \begin{align*}Y\end{align*}, and you compute the difference of the two rolls. In the second experiment, you roll the first die and then the second die, and you calculate the sum of the two rolls.
\begin{align*}\mu_X &= \sum x_{}p_{}(x) && \mu_Y = \sum y_{}p_{}(y)\\ \mu_X &= 3.5 && \mu_Y=-3.5\\ \sigma{^2}_X & \approx \sum (x_{}-\mu_X)^2 p_{}(x) && \sigma{^2}_Y \approx \sum (y_{}-\mu_Y)^2 p_{}(y)\\ \sigma{^2}_X & \approx 2.917 && \sigma{^2}_Y \approx 2.917\\ \mu_{X-Y}&=\mu_X - \mu_Y && \mu_{X+Y} = \mu_X+\mu_Y\\ \mu_{X-Y} &= 3.5 - (-3.5)=7 && \mu_{X+Y} = 3.5 + (-3.5)=0\\ \sigma{^2}_{X-Y}&=\sigma{^2}_X+\sigma{^2}_Y && \sigma{^2}_{X+Y} = \sigma{^2}_X+\sigma{^2}_Y\\ \sigma{^2}_{X-Y} &\approx 2.917 + 2.917 = 5.834 && \sigma{^2}_{X+Y} \approx 2.917 + 2.917 = 5.834\end{align*}
Notice how the expected values and the variances for the two dice combine in these two experiments.
Example
Beth earns $25.00 an hour for tutoring but spends $20.00 an hour for piano lessons. She saves the difference between her earnings for tutoring and the cost of the piano lessons. The numbers of hours she spends on each activity in one week vary independently according to the probability distributions shown below.
Hours of Piano Lessons, \begin{align*}x\end{align*} | Probability, \begin{align*}p(x)\end{align*} |
---|---|
0 | 0.3 |
1 | 0.3 |
2 | 0.4 |
Hours of Tutoring, \begin{align*}y\end{align*} | Probability, \begin{align*}p(y)\end{align*} |
---|---|
1 | 0.2 |
2 | 0.3 |
3 | 0.2 |
4 | 0.3 |
Example 1
Determine her expected weekly savings and the standard deviation of these savings
\begin{align*}X\end{align*} represents the number of hours per week taking piano lessons, and \begin{align*}Y\end{align*} represents the number of hours tutoring per week. The mean and standard deviation for each can be calculated as follows:
\begin{align*}E(x) &= \mu_X = \sum x_{}p_{}(x) && \sigma{^2}_X = \sum (x_{}-\mu_X)^2 p{}(x)\\ \mu_X &= (0)(0.3)+(1)(0.3)+(2)(0.4) && \sigma{^2}_X = (0-1.1)^2 (0.3)+(1-1.1)^2(0.3)+(2-1.1)^2(0.4)\\ \mu_X &= 1.1 && \sigma{^2}_X = 0.69\\ &&& \sigma_X = 0.831\end{align*}
\begin{align*}E(y) &= \mu_Y = \sum y_{}p_{}(y) && \sigma{^2}_Y = \sum (y_{}-\mu_Y)^2p_{}(y)\\ \mu_Y &= (1)(0.2)+(2)(0.3)+(3)(0.2)+(4)(0.3) && \sigma{^2}_Y = (1-2.6)^2 (0.2)+(2-2.6)^2(0.3)+(3-2.6)^2(0.2)\\ &&& +(4-2.6)^2(0.3)\\ \mu_Y &= 2.6 && \sigma{^2}_Y = 1.24\\ &&& \sigma_Y = 1.11\end{align*}
The expected number of hours Beth spends on piano lessons is 1.1 with a standard deviation of 0.831 hours. Likewise, the expected number of hours Beth spends tutoring is 2.6 with a standard deviation of 1.11 hours.
Beth spends $20 for each hour of piano lessons, so her mean weekly cost for piano lessons can be calculated with the Linear Transformation Rule as shown:
\begin{align*}\mu_{20 X}=(20)(\mu_X)=(20)(1.1)=\$ 22\end{align*} by the Linear Transformation Rule.
Beth earns $25 for each hour of tutoring, so her mean weekly earnings from tutoring are as follows:
\begin{align*}\mu_{25 Y}=(25)(\mu_Y)=(25)(2.6)=\$ 65\end{align*} by the Linear Transformation Rule.
Thus, Beth's expected weekly savings are:
\begin{align*}\mu_{25 Y}-\mu_{20 X}=\$ 65 - \$ 22 = \$ 43\end{align*} by the Subtraction Rule.
The standard deviation of the cost of her piano lessons is:
\begin{align*}\sigma_{20 X}=(20)(0.831)=\$ 16.62\end{align*} by the Linear Transformation Rule.
The standard deviation of her earnings from tutoring is:
\begin{align*}\sigma_{25 Y}=(25)(1.11)=\$ 27.75\end{align*} by the Linear Transformation Rule.
Finally, the variance and standard deviation of her weekly savings is:
\begin{align*}\sigma{^2}_{25Y-20X} &= \sigma{^2}_{25 Y}+\sigma{^2}_{20 X}=(27.75)^2+(16.62)^2=1046.2896\\ \sigma_{25Y-20X} &\approx \$ 32.35\end{align*}
Review
- Find the expected value for the sum of two fair dice.
- Find the standard deviation for the sum of two fair dice.
- It is estimated that 70% of the students attending a school in a rural area take the bus to school. Suppose you randomly select three students from the population. Construct the probability distribution of the random variable, \begin{align*}X\end{align*}, defined as the number of students who take the bus to school. (Hint: Begin by listing all of the possible outcomes.)
- The Safe Grad Committee at a high school is selling raffle tickets on a Christmas Basket filled with gifts and gift cards. The prize is valued at $1200, and the committee has decided to sell only 500 tickets. What is the expected value of a ticket? If the students decide to sell tickets on three monetary prizes – one valued at $1500 dollars and two valued at $500 each, what is the expected value of the ticket now?
- A recent law has been passed banning the use of hand-held cell phones while driving, and a survey has revealed that 76% of drivers now refrain from using their cell phones while driving. Three drivers were randomly selected, and a probability distribution table was constructed to record the outcomes. Let \begin{align*}N\end{align*} represent those drivers who never use their cell phones while driving and \begin{align*}S\end{align*} represent those who do use their cell phones while driving. Calculate the expected value and the variance using your calculator.
- True or False? If \begin{align*}X\end{align*} and \begin{align*}Y\end{align*} are random variables then \begin{align*}X^2+Y^3\end{align*} is a random variable.
- Are these concepts applicable to real-life situations?
- Will knowing these concepts allow you estimate information about a population?
- Suppose you have a six-sided fair die. Let the random variable X be the number that shows when you roll the die one time. Suppose in addition you have a fair four sided die with the numbers 1, 2, 3, 3. Let the random variable Y be the number that appears when you roll this die one time. Define a third random variable Z = X + Y.
- Write the probability distribution for X.
- Write the probability distribution for Y.
- Write the probability distribution for Z.
- Suppose in a box there are 4 tickets. Each ticket has two numbers on it. Ticket one has the numbers 1 and 2; ticket two has the numbers 1 and 3; ticket three has the numbers 5 and 7; and ticket four has the numbers 4 and 2. Define the following two random variables: is the first number on the ticket and is the second number on the ticket.
- What is the probability that if you draw a ticket from random from the box that the first number will be a 1?
- Define a new random variable \begin{align*}2X+3Y\end{align*}. What is the probability that this new random variable with have a value of 11?
- Suppose there are six numbers in a box: 1, 2, 3, 4, 5, 6. You draw two numbers out of the box, without replacement. Find the distribution table for the random variable \begin{align*}S=X_1+X_2\end{align*} where \begin{align*}X_1\end{align*} represents the first number drawn and \begin{align*}X_2\end{align*} represents the second number drawn.
Review (Answers)
To view the Review answers, open this PDF file and look for section 4.4.