<meta http-equiv="refresh" content="1; url=/nojavascript/"> Sums and Differences of Independent Random Variables | CK-12 Foundation

# 7.5: Sums and Differences of Independent Random Variables

Created by: CK-12

## Learning Objectives

• Construct probability distributions of random variables.
• Calculate the mean and standard deviation for sums and differences of independent random variables.

## Introduction

In previous lessons, you learned that sampling is a way of estimating a parameter of a population by selecting data from that population and a way of computing the chance of obtaining a specified outcome from a sample given a specific population. In Fort McMurray, Alberta, better known as “Fort McMoney” because of the oil industry, housing is becoming a rare commodity. Suppose you are a recent winner of a lottery and you decide to invest your winnings in a housing project for this city. Your plan is to build $500$ single-family homes. Before you begin, there are some facts that you need to know so that the houses will be a quick sell and you can turn a profit for your investment. One bit of information that you would like to ascertain is how many televisions each family will have so you will know how many cable hook-ups to install in each household. Short of conducting a survey, how will you determine a solution to this problem?

## Probability Distributions from Data

To begin, you should contact the local cable provider in the city and ask them to provide you with a record of the distribution of cable hook-ups per household that are currently on their books. Suppose they release this data:

Hook-ups per Household Proportion of Households
$0$ $0.092$
$1$ $0.328$
$2$ $0.380$
$3$ $0.142$
$4$ $0.058$

From the above table and histogram, you now have some estimates with which to work. The $'4'$ represents '$4$ or more’ but there are only $0.058$ or $5.8 \%$ of the houses that fall in this category. As well, if you add $0.380 + 0.142 + 0.058$, you will see that a little more than $50 \% (0.580)$ of the households will have $2$ or more cable hook-ups. The above distribution is skewed toward the larger numbers and has a mean of approximately $1.746$. By using probability from samples with known distributions, you now have some data for the unknown population.

The data above can also be shown in another way to display numbers that represent the possible number of hook-ups in each household. This display is done by using a list of random digits to select a household at random from this distribution. The numbers must be three-digit numbers because the percentages all have three decimal places.

Number of Hook-ups per Household Proportions of Households Random Numbers Representing this Category
$0$ $0.092$ $001-092$
$1$ $0.328$ $093-329$
$2$ $0.380$ $330-382$
$3$ $0.142$ $383-524$
$4$ $0.058$ $525-582$

A three-digit random number does not represent an individual household, but with other three-digit numbers in its category, it represents the many households in the category.

From this display, you can see that $92$ households out of $1000$ require no cable hook-up while $328$ out of $1000$ require one cable hook-up. This way of displaying the data allows you to see actual numbers for each household.

Probability distribution is the set of values that a random variable takes on. Its value depends upon the result of a trial. The random variable, $X$, will represent the number of cable hook-ups in a randomly selected household. Therefore, $P(x = 2) \approx 0.380$, because approximately $38.0 \%$ of the randomly selected households require two cable hook-ups.

At this time, there are two ways that you can create probability distributions from data. Sometimes previously collected data, relative to the random variable that you are studying, can serve as a probability distribution. This was the case with the data received from the local cable company in Fort McMurray. In addition to this method, a simulation is also a good way to create an approximate probability distribution. A probability distribution can also be constructed from basic principles and assumptions by using the rules of theoretical probability. The following examples will lead to the understanding of these rules of theoretical probability.

Example:

Create a table that shows all the possible outcomes when two die are rolled simultaneously. (Hint: There are $36$ possible outcomes.)

$&&& && && 2^{\text{nd}} && \text{Die}\\&&& {\color{blue}1} && {\color{blue}2} && {\color{blue}3} && {\color{blue}4} && {\color{blue}5} && {\color{blue}6}\\& {\color{red}1} && 1,1 && 1,2 && 1,3 && 1,4 && 1,5 && 1,6\\& {\color{red}2} && 2,1 && 2,2 && 2,3 && 2,4 && 2,5 && 2,6\\& {\color{red}3} && 3,1 && 3,2 && 3,3 && 3,4 && 3,5 && 3,6 && 1^{\text{st}} \text{ Die}\\& {\color{red}4} && 4,1 && 4,2 && 4,3 && 4,4 && 4,5 && 4,6\\& {\color{red}5} && 5,1 && 5,2 && 5,3 && 5,4 && 5,5 && 5,6\\& {\color{red}6} && 6,1 && 6,2 && 6,3 && 6,4 && 6,5 && 6,6$

This table of possible outcomes when two die are rolled simultaneously can now be used to construct other probability distributions. The first table will display the sum of the two die and the second will represent the larger of the two numbers.

Sum of Two Die, $x$ Probability, $p$
$2$ $1/36$
$3$ $2/36$
$4$ $3/36$
$5$ $4/36$
$6$ $5/36$
$7$ $6/36$
$8$ $5/36$
$9$ $4/36$
$10$ $3/36$
$11$ $2/36$
$12$ $1/36$
Total $1$
Larger Number, $x$ Probability, $p$
$1$ $1/36$
$2$ $3/36$
$3$ $5/36$
$4$ $7/36$
$5$ $9/36$
$6$ $11/36$
Total $1$

When you roll the two die, what is the probability that the sum of the two die is $4$? The probability that the sum of the two die is four is $\frac{3}{36}$.

What is the probability that the larger number is $4$? The probability that the larger number is four is $\frac{7}{36}$.

Example:

The Regional Hospital has recently opened a new pulmonary unit and has released the following data on the proportion of silicosis cases caused by working in the coal mines. Suppose two silicosis patients are randomly selected from the large population with the disease.

Silicosis Cases Proportion
Worked in the mine $0.80$
Did not work in the mine $0.20$

There are four possible outcomes for the two patients. With ‘yes’ representing “worked in the mines” and ‘no’ representing “did not work in the mines”, the possibilities are

First Patient Second Patient
1 No No
2 Yes No
3 No Yes
4 Yes Yes

The patients for this survey have been randomly selected from a large population and therefore the outcomes are independent. The probability for each outcome can be calculated by applying this rule:

$P (\text{no for } \ 1^{st}) \cdot P (\text{no for } \ 2nd) & = (0.2) (0.2) = 0.04 \\P (\text{yes for} \ 1^{st}) \cdot P (\text{no for } \ 2nd) & = (0.8) (0.2) = 0.16 \\P (\text{no for } \ 1^{st}) \cdot P (\text{yes for} \ 2nd) & = (0.2) (0.8) = 0.16 \\P (\text{yes for} \ 1^{st}) \cdot P (\text{yes for} \ 2nd) & = (0.8) (0.8) = 0.64$

If $X$ represents the number of mine workers in this random sample, then the first of these outcomes results in $X = 0$, the second and third each result in $X = 1$ and the fourth results in $X = 2$. Because the second and third outcomes are disjoint, their probabilities can be added. The probability distribution of $X$ is given in the table below:

$x$ Probability of $x$
$0$ $0.04$
$1$ $0.16 + 0.16 = 0.32$
$2$ $0.64$

These probabilities are added because the outcomes are disjoint.

Example:

The Quebec Junior Major Hockey League has five teams from the Maritime Provinces. These teams are Cape Breton Screaming Eagles, Halifax Mooseheads, PEI Rockets, Moncton Wildcats and Saint John Sea Dogs. Each team has its own hometown arena and each arena has a seating capacity that is listed below:

Team Seating Capacity (Thousands)
Screaming Eagles $5$
Mooseheads $10$
Rockets $4$
Wildcats $7$
Sea Dogs $6$

A schedule can now be drawn up for the teams to play pre-season exhibition games. One game will be played in each home arena and the possible capacity attendance will also be calculated. In addition, the probability of the total possible attendance being at least $12,000$ people will also be calculated.

The number of possible combinations of two teams from these five is $10$. $(_{5}C_{2})$. The following table shows the possible attendance for each of the pre-season, exhibition games.

Teams Combined Attendance Capacity for Both Games (Thousands)
Eagles/Mooseheads $5 + 10 = 15$
Eagles/Rockets $5 + 4 = 9$
Eagles/Wildcats $5 + 7 = 12$
Eagles/Sea Dogs $5 + 6 = 11$
Mooseheads/Rockets $10 + 4 = 14$
Mooseheads/Wildcats $10 + 7 = 17$
Mooseheads/Sea Dogs $10 + 6 = 16$
Rockets/Wildcats $4 + 7 = 11$
Rockets/Sea Dog $4 + 6 = 10$
Sea Dogs/Wildcats $6 + 7 = 13$

The last calculation is to determine the probability distribution of the capacity attendance.

Capacity Attendance, $x$ Probability, $p$
$9$ $0.1$
$10$ $0.1$
$11$ $0.2$
$12$ $0.1$
$13$ $0.1$
$14$ $0.1$
$15$ $0.1$
$16$ $0.1$
$17$ $0.1$

The probability that the capacity attendance will be at least $12,000$ is $0.6 (0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1)$

## Expected Values and Standard Deviation

Returning to the original problem of the number of cable hook-ups for the $500$ single-family homes, take another look at figures one and two. From these displays, you can find the mean number of hook-ups per household. You expect $9.2\%$ of households to have no hook-up, $32.8\%$ to have one hook-up, $38.0\%$ to have two hook-ups, $14.2\%$ to have three hook-ups and $5.8\%$ to have four hook-ups. To calculate the mean number of hook-ups per household, use the previous figure and add another column.

Hook-ups per Household, $x$ Proportion of Households, $p$ Contribution to Mean, $x \cdot p$
$0$ $0.092$ $0$
$1$ $0.328$ $0.328$
$2$ $0.380$ $0.760$
$3$ $0.142$ $0.426$
$4$ $0.058$ $0.232$
Sum $\longrightarrow$ $1.746$

The mean of a probability distribution for the random variable $X$ is denoted by $\mu_x$ or $E(X)$ which represents expected value. Since you now know the expected number of cable hook-ups for each household, you can also calculate how much each household will differ from this mean. In other words, you can calculate the expected standard deviation. To do this, simply determine the expected value of the square of the deviations from the mean. As you recall from chapter 1, this value is called the variance of the probability distribution, and gives a representation of how far an actual value will in general stray from this mean.

$\sigma^2{_x} = &(0 - 1.746)^2(0.092) + (1 - 1.746)^2(0.328) + (2 - 1.746)^2(0.380) \\& + (3 - 1.746)^2(0.142) + (4 - 1.746)^2(0.058) \\\approx & 1.0054$

The standard deviation ($\sigma_x$) is $\sqrt{1.0054} \approx 1.002$. This indicates that each household will have $1.746$ cable hook-ups and differ from this mean by an average of about $1$ hook-up. These calculations yield the following formulas for calculating the expected value and the standard deviation (and its square, the variance) for a probability distribution.

$E(X) = \mu_x = \sum x_i p_i && \text{and} && Var(X) & = \sigma^2{_x} = \sum (x_i - \mu_x)^2 p_i \\&&&&\sigma_x & = \sqrt{\text{var} (X)}$

where $p_i$ is the probability of the random variable $X$ produced when $x$ takes on a specific value $x_i$.

Example:

Suppose an individual plays a gambling game where it is possible to lose $\2.00$, break even, win $\6.00$, or win $\20.00$ each time he plays. The probability distribution for each outcome is provided by the following table:

Winnings, $x$ Probability, $p$
$-\2.00$ $0.30$
$\0.00$ $0.40$
$\6.00$ $0.20$
$\20.00$ $0.10$

Solution:

Now use the table to calculate the expected value and the variance of this distribution.

$\mu_x & = \sum x_i p_i \\\mu_x & = (-2 \cdot 0.30) + (0 \cdot 0.40) + (6 \cdot 0.20) + (20 \cdot 0.10) \\\mu_x & = 2.6$

The player can expect to win $\2.60$ playing this game.

The variance of this distribution is:

$\sigma^2{_x} & = \sum x_i - \mu_x{^2} p_i\\\sigma^2{_x} & = (-2 - 2.6)^2 (0.30) + (0 - 2.6)^2(0.40) + (6 - 2.6)^2(0.20) + (20 - 2.6)^2( 0.10)\\\sigma^2{_x} & \approx 41.64$

So the standard deviation, $\sigma_x$, is approximately $\sqrt{41.64} \approx \6.46$

Example:

The following probability distribution was constructed from the results of a survey at the local university. The random variable is the number of fast food meals purchased by a student during the preceding year ($12$ months). For this distribution, calculate the expected value and the standard deviation.

Number of Meals Purchased Within 12 Months, $x$ Probability, $p$
$0$ $0.04$
$[1 - 6)$ $0.30$
$[6 -11)$ $0.29$
$[11 - 21)$ $0.17$
$[21 - 51)$ $0.15$
$>50$ $0.05$
Total $1.00$

The mean for each interval is in the center of each interval, so you must begin by estimating a mean for each interval. For the first interval of $[1 - 6)$, six is not included in this interval so a value of $3$ would be the center. This same procedure will be used to estimate the mean of all the intervals. Therefore the expected value is:

Solution:

$\mu_x & = \sum x_i p_i \\\mu_x & = 0(0.04) + 3(0.30) + 8(0.29) + 15.5(0.17) + 35.5 (0.15) + 55(0.05)\\ \mu_x & = 13.93$

And

$\sigma^2{_x} = & \;\sum (x_i- \mu_x)^2 p_i\\= &\;(0 - 13.93)^2 (0.04) + (3 - 13.93)^2 (0.30) \\& \;+ (8 - 13.93)^2 (0.29) + (15.5 - 13.93)^2 (0.17)\\ & \;+ (35.5 - 13.93)^2 (0.15) + (55 - 13.93)^2 (0.05)\\\approx &\; 208.3451\ \text{and} \ \sigma_x \approx 14.43$

The expected number of fast food meals purchased by a student at the local university is $13.93$. This number should not be rounded since the mean does not have to be one of the values in the distribution. You should also notice that the standard deviation is very close to the expected value. This means that the distribution will be skewed to the right and have long tails toward the larger numbers.

Notice that $\bar{x} = 13.93$ and $\sigma_x = 14.43$.

## Linear Transformations of X on Mean of x and Standard Deviation of x

If you add the same number to all values of a data set, the shape or standard deviation of the data remains the same but the value is added to the mean. This is referred to as recentering the data set. Likewise, if you rescale the data – multiply all data values by the same nonzero number- the basic shape will not change but the mean and the standard deviation will each be a multiple of this number. The standard deviation must be multiplied by the absolute value of the number. If you multiply the mean and the standard deviation by a constant $d$ and then add a constant $c$, then the mean and the standard deviation of the transformed values are expressed as:

$\mu_{c + dx} & = c + d\mu_x\\\sigma_{c + dx} & = |d| \sigma_x$

The implications of these can be better understood if you return to example 1.

Example:

The casino has decided to ‘triple’ the prizes for the game being played. What are the expected winnings for a person who plays one game? What is the standard deviation?

Solution:

Recall that the expected value was $\2.60$ and the standard deviation was $\6.46$. The simplest way to calculate the expected value of the tripled prize is $3(\2.60)$, or $\7.80$, with a standard deviation of $3(\6.46)$, or $\19.38$. Here $c = 0$ and $d = 3$. Another method of calculating the expected value would be to create a new table for the tripled prize:

Winnings, $x$ Probability, $p$
$-\2.00$ $0.30$
$\0.00$ $0.40$
$\6.00$ $0.20$
$\20.00$ $0.10$

New Table

Original Winnings, $x$ New Winnings, $3x$ Probability, $p$
$-\2.00$ $-\6.00$ $0.30$
$\0.00$ $\0.00$ $0.40$
$\6.00$ $\18.00$ $0.20$
$\20.00$ $\60.00$ $0.10$

The calculations can be done using the formulas or by using the graphing calculator.

Using the graphing calculator:

Notice that the same results are obtained.

This same problem can be changed again in order to introduce the addition and subtraction rules for random variables. Suppose the casino wants to encourage customers to play more, so begins demanding that customers play the game in sets of three. What are the expected value (total winnings) and standard deviation now?

Solution:

Let $X, Y$ and $Z$ represent the total winnings on each game played. If this is the case, then $\mu_{X + Y + Z}$ is the expected value of the total winnings when three games are played. The expected value of the total winnings for playing one game was $\2.60$ so for three games the expected value is: $y$

$\mu_{X + Y + Z} & = \mu_X + \mu_Y + \mu_Z\\\mu_{X + Y + Z} & = \2.60 + \2.60 + \%2.60\\\mu_{X + Y + Z} & = \7.80$

The expected value is the same as that for the tripled prize.

Since the winnings on the three games played are independent, the standard deviation of $X + Y + Z$ is:

$\sigma^2{_{X + Y + Z}} & = \sigma^2{_X} + \sigma^2{_Y} + \sigma^2{_Z}\\\sigma^2{_{X + Y + Z}} & = 6.46^2 + 6.46^2 + 6.46^2\\\sigma^2{_{X + Y + Z}} & \approx 125.1948 \ \ \ \text{and} \ \ \ \sigma \approx \sqrt{125.1948} \approx 11.19$

The person playing the three games expects to win $\7.80$ with a standard deviation of $\11.19$. When the prize was tripled, there was a greater standard deviation $(\19.36)$ than when the person played three games $(\11.19)$.

The rules for addition and subtraction for random variables are:

If $X$ and $Y$ are random variables then:

$\mu_{X + Y} = \mu_X + \mu_Y \\\mu_{X - Y} = \mu_X - \mu_Y$

If $X$ and $Y$ are independent then:

$\sigma^2{_{X + Y}} = \sigma^2{_X }+ \sigma^2{_Y} \\\sigma^2{_{X - Y}} = \sigma^2{_X} + \sigma^2{_Y}$

Variances are added for both the sum and difference of two independent random variables because the variation in each variable contributes to the variation in each case. Subtracting is the same as adding the opposite. Suppose you have two dice, one die $(X)$ with the normal positive numbers $1$ through $6$, and another $(Y)$ with the negative numbers $-1$ through $-6$. Then suppose you perform two experiments. In the first, you roll the first die $(X)$ and then the second die $(Y)$, and you compute the difference of the two rolls. In the second experiment you roll the first die $(X)$ and then the second die $(Y)$ and you calculate the sum of the two rolls.

$\mu_x & = \sum x_ip_i && \mu_y = \sum x_ip_i\\\mu_X & = 3.5 && \mu_Y = -3.5$

$\sigma^2{_x} & \approx \sum (x_i - \mu_x)^2 p_i && \sigma^2{_y} \approx \sum (x_i - \mu_y)^2 p_i\\\sigma^2{_x} & \approx 2.917 && \sigma^2{_y} \approx 2.917$

$\mu_{X + Y} & = \mu_X + \mu_Y && \mu_{X + Y} = \mu_X - \mu_Y\\\mu_{X + Y} & = 3.5 + (-3.5) = 0 && \mu_{X - Y} = 3.5 - (-3.5) = 7\\\sigma^2{_{X + Y}} & = \sigma^2{_X} + \sigma^2{_Y} && \sigma^2{_{X - Y}} = \sigma^2{_X} + \sigma^2{_Y}\\\sigma^2{_{X + Y}} & \approx 2.917 + 2.917 = 5.834 && \sigma^2{_{X - Y}} \approx 2.917 + 2.917 = 5.834$

Notice how the expected values and the variances combine for these two experiments.

Example:

I earn $\25.00$ an hour for tutoring but spend $\20.00$ an hour for piano lessons. I save the difference between my earnings for tutoring and the cost of the piano lessons. The number of hours I spend on each activity in one week varies independently according to the probability distributions shown below. Determine my expected weekly savings and the standard deviation of these savings.

Hours of Piano Lessons, $x$ Probability, $p$
$0$ $0.3$
$1$ $0.3$
$2$ $0.4$
Hours of Tutoring, $x$ Probability, $p$
$1$ $0.2$
$2$ $0.3$
$3$ $0.2$
$4$ $0.3$

Solution:

$X$ will represent the number of hours per week taking piano lessons and $Y$ will represent the number of hours tutoring per week.

Feb 23, 2012

Jul 03, 2014