7.5: Sums and Differences of Independent Random Variables
Learning Objectives
- Construct probability distributions of random variables.
- Calculate the mean and standard deviation for sums and differences of independent random variables.
Introduction
In previous lessons, you learned that sampling is a way of estimating a parameter of a population by selecting data from that population and a way of computing the chance of obtaining a specified outcome from a sample given a specific population. In Fort McMurray, Alberta, better known as “Fort McMoney” because of the oil industry, housing is becoming a rare commodity. Suppose you are a recent winner of a lottery and you decide to invest your winnings in a housing project for this city. Your plan is to build single-family homes. Before you begin, there are some facts that you need to know so that the houses will be a quick sell and you can turn a profit for your investment. One bit of information that you would like to ascertain is how many televisions each family will have so you will know how many cable hook-ups to install in each household. Short of conducting a survey, how will you determine a solution to this problem?
Probability Distributions from Data
To begin, you should contact the local cable provider in the city and ask them to provide you with a record of the distribution of cable hook-ups per household that are currently on their books. Suppose they release this data:
Hook-ups per Household | Proportion of Households |
---|---|
From the above table and histogram, you now have some estimates with which to work. The represents ' or more’ but there are only or of the houses that fall in this category. As well, if you add , you will see that a little more than of the households will have or more cable hook-ups. The above distribution is skewed toward the larger numbers and has a mean of approximately . By using probability from samples with known distributions, you now have some data for the unknown population.
The data above can also be shown in another way to display numbers that represent the possible number of hook-ups in each household. This display is done by using a list of random digits to select a household at random from this distribution. The numbers must be three-digit numbers because the percentages all have three decimal places.
Number of Hook-ups per Household | Proportions of Households | Random Numbers Representing this Category |
---|---|---|
A three-digit random number does not represent an individual household, but with other three-digit numbers in its category, it represents the many households in the category.
From this display, you can see that households out of require no cable hook-up while out of require one cable hook-up. This way of displaying the data allows you to see actual numbers for each household.
Probability distribution is the set of values that a random variable takes on. Its value depends upon the result of a trial. The random variable, , will represent the number of cable hook-ups in a randomly selected household. Therefore, , because approximately of the randomly selected households require two cable hook-ups.
At this time, there are two ways that you can create probability distributions from data. Sometimes previously collected data, relative to the random variable that you are studying, can serve as a probability distribution. This was the case with the data received from the local cable company in Fort McMurray. In addition to this method, a simulation is also a good way to create an approximate probability distribution. A probability distribution can also be constructed from basic principles and assumptions by using the rules of theoretical probability. The following examples will lead to the understanding of these rules of theoretical probability.
Example:
Create a table that shows all the possible outcomes when two die are rolled simultaneously. (Hint: There are possible outcomes.)
This table of possible outcomes when two die are rolled simultaneously can now be used to construct other probability distributions. The first table will display the sum of the two die and the second will represent the larger of the two numbers.
Sum of Two Die, | Probability, |
---|---|
Total |
Larger Number, | Probability, |
---|---|
Total |
When you roll the two die, what is the probability that the sum of the two die is ? The probability that the sum of the two die is four is .
What is the probability that the larger number is ? The probability that the larger number is four is .
Example:
The Regional Hospital has recently opened a new pulmonary unit and has released the following data on the proportion of silicosis cases caused by working in the coal mines. Suppose two silicosis patients are randomly selected from the large population with the disease.
Silicosis Cases | Proportion |
---|---|
Worked in the mine | |
Did not work in the mine |
There are four possible outcomes for the two patients. With ‘yes’ representing “worked in the mines” and ‘no’ representing “did not work in the mines”, the possibilities are
First Patient | Second Patient | |
---|---|---|
1 | No | No |
2 | Yes | No |
3 | No | Yes |
4 | Yes | Yes |
The patients for this survey have been randomly selected from a large population and therefore the outcomes are independent. The probability for each outcome can be calculated by applying this rule:
If represents the number of mine workers in this random sample, then the first of these outcomes results in , the second and third each result in and the fourth results in . Because the second and third outcomes are disjoint, their probabilities can be added. The probability distribution of is given in the table below:
Probability of | |
---|---|
These probabilities are added because the outcomes are disjoint.
Example:
The Quebec Junior Major Hockey League has five teams from the Maritime Provinces. These teams are Cape Breton Screaming Eagles, Halifax Mooseheads, PEI Rockets, Moncton Wildcats and Saint John Sea Dogs. Each team has its own hometown arena and each arena has a seating capacity that is listed below:
Team | Seating Capacity (Thousands) |
---|---|
Screaming Eagles | |
Mooseheads | |
Rockets | |
Wildcats | |
Sea Dogs |
A schedule can now be drawn up for the teams to play pre-season exhibition games. One game will be played in each home arena and the possible capacity attendance will also be calculated. In addition, the probability of the total possible attendance being at least people will also be calculated.
The number of possible combinations of two teams from these five is . . The following table shows the possible attendance for each of the pre-season, exhibition games.
Teams | Combined Attendance Capacity for Both Games (Thousands) |
---|---|
Eagles/Mooseheads | |
Eagles/Rockets | |
Eagles/Wildcats | |
Eagles/Sea Dogs | |
Mooseheads/Rockets | |
Mooseheads/Wildcats | |
Mooseheads/Sea Dogs | |
Rockets/Wildcats | |
Rockets/Sea Dog | |
Sea Dogs/Wildcats |
The last calculation is to determine the probability distribution of the capacity attendance.
Capacity Attendance, | Probability, |
---|---|
The probability that the capacity attendance will be at least is
Expected Values and Standard Deviation
Returning to the original problem of the number of cable hook-ups for the single-family homes, take another look at figures one and two. From these displays, you can find the mean number of hook-ups per household. You expect of households to have no hook-up, to have one hook-up, to have two hook-ups, to have three hook-ups and to have four hook-ups. To calculate the mean number of hook-ups per household, use the previous figure and add another column.
Hook-ups per Household, | Proportion of Households, | Contribution to Mean, |
---|---|---|
Sum |
The mean of a probability distribution for the random variable is denoted by or which represents expected value. Since you now know the expected number of cable hook-ups for each household, you can also calculate how much each household will differ from this mean. In other words, you can calculate the expected standard deviation. To do this, simply determine the expected value of the square of the deviations from the mean. As you recall from chapter 1, this value is called the variance of the probability distribution, and gives a representation of how far an actual value will in general stray from this mean.
The standard deviation () is . This indicates that each household will have cable hook-ups and differ from this mean by an average of about hook-up. These calculations yield the following formulas for calculating the expected value and the standard deviation (and its square, the variance) for a probability distribution.
where is the probability of the random variable produced when takes on a specific value .
Example:
Suppose an individual plays a gambling game where it is possible to lose , break even, win , or win each time he plays. The probability distribution for each outcome is provided by the following table:
Winnings, | Probability, |
---|---|
Solution:
Now use the table to calculate the expected value and the variance of this distribution.
The player can expect to win playing this game.
The variance of this distribution is:
So the standard deviation, , is approximately
Example:
The following probability distribution was constructed from the results of a survey at the local university. The random variable is the number of fast food meals purchased by a student during the preceding year ( months). For this distribution, calculate the expected value and the standard deviation.
Number of Meals Purchased Within 12 Months, | Probability, |
---|---|
Total |
The mean for each interval is in the center of each interval, so you must begin by estimating a mean for each interval. For the first interval of , six is not included in this interval so a value of would be the center. This same procedure will be used to estimate the mean of all the intervals. Therefore the expected value is:
Solution:
And
The expected number of fast food meals purchased by a student at the local university is . This number should not be rounded since the mean does not have to be one of the values in the distribution. You should also notice that the standard deviation is very close to the expected value. This means that the distribution will be skewed to the right and have long tails toward the larger numbers.
Notice that and .
Linear Transformations of X on Mean of x and Standard Deviation of x
If you add the same number to all values of a data set, the shape or standard deviation of the data remains the same but the value is added to the mean. This is referred to as recentering the data set. Likewise, if you rescale the data – multiply all data values by the same nonzero number- the basic shape will not change but the mean and the standard deviation will each be a multiple of this number. The standard deviation must be multiplied by the absolute value of the number. If you multiply the mean and the standard deviation by a constant and then add a constant , then the mean and the standard deviation of the transformed values are expressed as:
The implications of these can be better understood if you return to example 1.
Example:
The casino has decided to ‘triple’ the prizes for the game being played. What are the expected winnings for a person who plays one game? What is the standard deviation?
Solution:
Recall that the expected value was and the standard deviation was . The simplest way to calculate the expected value of the tripled prize is , or , with a standard deviation of , or . Here and . Another method of calculating the expected value would be to create a new table for the tripled prize:
Winnings, | Probability, |
---|---|
New Table
Original Winnings, | New Winnings, | Probability, |
---|---|---|
The calculations can be done using the formulas or by using the graphing calculator.
Using the graphing calculator:
Notice that the same results are obtained.
This same problem can be changed again in order to introduce the addition and subtraction rules for random variables. Suppose the casino wants to encourage customers to play more, so begins demanding that customers play the game in sets of three. What are the expected value (total winnings) and standard deviation now?
Solution:
Let and represent the total winnings on each game played. If this is the case, then is the expected value of the total winnings when three games are played. The expected value of the total winnings for playing one game was so for three games the expected value is:
The expected value is the same as that for the tripled prize.
Since the winnings on the three games played are independent, the standard deviation of is:
The person playing the three games expects to win with a standard deviation of . When the prize was tripled, there was a greater standard deviation than when the person played three games .
The rules for addition and subtraction for random variables are:
If and are random variables then:
If and are independent then:
Variances are added for both the sum and difference of two independent random variables because the variation in each variable contributes to the variation in each case. Subtracting is the same as adding the opposite. Suppose you have two dice, one die with the normal positive numbers through , and another with the negative numbers through . Then suppose you perform two experiments. In the first, you roll the first die and then the second die , and you compute the difference of the two rolls. In the second experiment you roll the first die and then the second die and you calculate the sum of the two rolls.
Notice how the expected values and the variances combine for these two experiments.
Example:
I earn an hour for tutoring but spend an hour for piano lessons. I save the difference between my earnings for tutoring and the cost of the piano lessons. The number of hours I spend on each activity in one week varies independently according to the probability distributions shown below. Determine my expected weekly savings and the standard deviation of these savings.
Hours of Piano Lessons, | Probability, |
---|---|
Hours of Tutoring, | Probability, |
---|---|
Solution:
will represent the number of hours per week taking piano lessons and will represent the number of hours tutoring per week.