If you were told that the mean income at a certain company was $35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might ** skew** the mean badly. However, if you were also given the

**variance**of the incomes, how would that help?

### Calculating Variance

Variance (commonly denoted \begin{align*}\sigma ^2\end{align*}) is a very useful measure of the relative amount of ‘scattering’ of a given set. In other words, knowing the variance can give you an idea of how closely the values in a set cluster around the mean. The greater the variance, the more the data values in the set are spread out away from the mean.

Variance is an important calculation to become familiar with because, like the arithmetic mean, variance is used in many other more complex statistical evaluations. The calculation of variance is slightly different depending on whether you are working with a population (you do not intend to generalize the results back to a larger group) or a sample (you do intend to use the sample results to predict the results of a larger population). The difference is really only at the end of the process, so let’s start with the calculation of the population.

To calculate the variance of a population:

- First, identify the arithmetic mean of your data by finding the sum of the values and dividing it by the number of values.
- Next, subtract each value from the mean and record the result. This value is called the
**deviation**of each score from the mean. - For each value, square the
**deviation**. - Finally, divide the sum of the squared deviations by the number of values in the set. The resulting quotient is the
**variance**\begin{align*}(\sigma^2)\end{align*} of the set.

To calculate the variance of a sample, the only difference is that in step 4, you divide the sum of squared deviations by the number of values in the sample **minus 1**. By dividing the sum of squared deviations by one less than the number of values, you help reduce the effect of outliers in the sample and increase the calculated variance of the sample by a small amount to allow more ‘room’ for the unknown values in the population.

**Calculating the Variance **

1. Calculate the variance of set \begin{align*}x\end{align*}:

\begin{align*}x=\left \{12, 7, 6, 3, 10, 5, 18, 15\right \}\end{align*}

Follow the steps from above to calculate the variance:

- First, calculate the arithmetic mean:

\begin{align*}\mu =\frac{12+7+6+3+10+5+18+15}{8}=9.5\end{align*}

- Subtract each value from the mean to get the deviation of each value, square the deviation of each value:

\begin{align*}\text{Value} - \text{Mean} = \text{Deviation}\end{align*} |
\begin{align*}\text{Deviation}^2\end{align*} |

\begin{align*}12-9.5=2.5\end{align*} | 6.25 |

\begin{align*}7-9.5=-2.5\end{align*} | 6.25 |

\begin{align*}6-9.5=-3.5\end{align*} | 12.25 |

\begin{align*}3-9.5=-6.5\end{align*} | 42.25 |

\begin{align*}10-9.5=.5\end{align*} | .25 |

\begin{align*}5-9.5=-4.5\end{align*} | 20.25 |

\begin{align*}18-9.5=8.5\end{align*} | 72.25 |

\begin{align*}15-9.5=5.5\end{align*} | 30.25 |

TOTAL (sum of deviation^{2}): |
190.00 |

- Finally, divide the sum of the squared deviations by the count of values in the data set:

\begin{align*}\frac{190}{8} & =23.75\\ \therefore \ The \ variance \ & of \ set \ x \ is \ 23.75\end{align*}

2. Find the variance of set \begin{align*}z\end{align*}:

\begin{align*}z=\left \{1, 2, 3, 4, 5, 6, 7, 9\right \}\end{align*}

Divide the squared deviation of each value from the mean by the total number of values in the set:

\begin{align*}& \qquad \qquad \mu =\frac{1+2+3+4+5+6+7+9}{8}=4.625 \\ &(1-4.625)^2+(2-4.625)^2+(3-4.625)^2+(4-4.625)^2\\ &\qquad +(5-4.625)^2+(6-4.625)^2+(7-4.625)^2+(9-4.625)^2 =49.875 \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \ \frac{49.875}{8} =6.234\\ & \qquad \qquad \qquad \ \therefore \ Variance \ (\sigma^2) \ of \ set \ z = 6.234\end{align*}

3. Find \begin{align*}\sigma^2 \ of \ y\end{align*}:

\begin{align*}y=\left \{13, 14, 15, 16, 17, 18, 19, 20, 21\right \}\end{align*}

Let’s do this one differently, using a nifty trick known as the “mean of the squares minus the square of the mean.” Start, as before, by finding the arithmetic mean:

\begin{align*}\mu =\frac{13+14+15+16+17+18+19+20+21}{9}=17\end{align*}

Then, to find the variation, divide the sum of the squares of each value by the number of values (this is the “mean of the squares”), then square the mean we calculated above, 17 (the “square of the mean”), and subtract it from the mean of the squares:

\begin{align*}&\sigma ^2 = \frac{13^2+14^2+15^2+16^2+17^2+18^2+19^2+20^2+21^2}{9}-17^2=6.6\overline{6} \\ &\qquad \qquad \qquad \qquad \qquad \quad \therefore \ \sigma ^2 \ of \ y=6.6\overline{6}\end{align*}

**Earlier Problem Revisited**

If you were told that the mean income at a certain company was $35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might **skew** the mean badly. However, if you were also given the variance of the incomes, how would that help?

By learning the variance of the set of incomes, you could get a feel for how representative the $35,000 figure was of the likely salary of a common employee.

### Examples

#### Example 1

Find \begin{align*}\mu\end{align*} and \begin{align*}\sigma ^2\end{align*} of set \begin{align*}z\end{align*}.

Let’s use the “mean of the squares minus the square of the mean” method:

First find the mean of the set: \begin{align*}\frac{3.25+3.5+2.85+3.4+2.95+3.02+3.17}{7}=3.16286\end{align*}

Now divide the sum of each of the values squared by the number of values:

\begin{align*}\frac{3.25^2+3.5^2+2.85^2+3.4^2+2.95^2+3.02^2+3.17^2}{7}-10.0036=10.0524-10.0036=0.049\end{align*} **is the variance.**

#### Example 2

If all values of set \begin{align*}z\end{align*}, above, were increased by 5, what would the new mean and variance be?

Find the mean of the new set: \begin{align*}\frac{8.25+8.5+7.85+8.4+7.95+8.02+8.17}{7}=8.16286\end{align*}

Divide the sum of the values squared by the number of values: \begin{align*}\frac{466.7668}{7}=66.681\end{align*}

Subtract the squared mean from the mean of the squares: \begin{align*}66.681-66.632=0.049\end{align*} **is the variance.**

The variance is the same as before! Does that surprise you? It should, because they actually *aren’t* the same, it just appears that way due to rounding. The new set actually has a variance closer to 0.048688, and the original is more accurately 0.04873469. Obviously they are very close, but not exactly the same.

#### Example 3

If all values of set \begin{align*}z\end{align*} from question #1 were doubled, how would that affect \begin{align*}\mu\end{align*} and \begin{align*}\sigma ^2\end{align*}?

The question is what would happen if all of the values were doubled. Do the mean and variance also double? Let’s see:

The mean of the new set is \begin{align*}\frac{6.5+7+5.7+6.8+5.9+6.04+6.34}{7}=\frac{44.28}{7}=6.326\end{align*}, which is twice the mean of the original set. So far so good.

The “mean of the squares” is \begin{align*}\frac{6.5^2+7^2+5.7^2+6.8^2+5.9^2+6.04^2+6.34^2}{7}=\frac{281.47}{7}=40.21\end{align*}, which is *four times* the original mean of the squares, not double after all (which makes sense, given that each doubled value was squared).

Finally, subtract the two values: \begin{align*}40.21-6.326^2 = .192\end{align*} **is the variance.** If we compare this to the original: \begin{align*}\frac{.192}{.049}\approx 4\end{align*}, we can see that doubling the original values quadruples the variance.

### Review

Questions 1-12: find \begin{align*}\sigma ^2\end{align*}

- \begin{align*}y=\left \{4, 50, 63, 2, 82, 99\right \}\end{align*}
- Set \begin{align*}x\end{align*} is a random sample from a population with 38 members: \begin{align*}x=\left \{8, 13, 5, 10\right \}\end{align*}
- Set \begin{align*}z\end{align*} is a random sample from a larger population: \begin{align*}z=\left \{4,3,5,15,5\right \}\end{align*}
- \begin{align*}y=\left \{3,26,5,1,1\right \}\end{align*}
- 22, 21, 13, 19, 16, 18
- Sample: 1, 2, 5, 1
- Sample: 10, 6, 3, 4
- 8, 11, 17, 7, 19
- 15, 17, 19, 21, 23, 25, 27, 29
- Sample: 15, 17, 19, 21, 23, 25, 27, 29
- .25, .35, .45, .55, .26, .75
- Find the variance of the data in the table:

HEIGHTS (rounded to the nearest inch) |
FREQUENCY OF STUDENTS |

60 | 35 |

61 | 33 |

62 | 45 |

63 | 4 |

64 | 3 |

65 | 4 |

66 | 7 |

67 | 4 |

### Review (Answers)

To view the Review answers, open this PDF file and look for section 5.6.