<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# Variance of a Data Set

## The mean of the squares of the deviation of data values

Estimated17 minsto complete
%
Progress
Practice Variance of a Data Set

MEMORY METER
This indicates how strong in your memory this concept is
Progress
Estimated17 minsto complete
%
Calculating Variance

If you were told that the mean income at a certain company was 35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might skew the mean badly. However, if you were also given the variance of the incomes, how would that help? ### Calculating Variance Variance (commonly denoted \begin{align*}\sigma ^2\end{align*}) is a very useful measure of the relative amount of ‘scattering’ of a given set. In other words, knowing the variance can give you an idea of how closely the values in a set cluster around the mean. The greater the variance, the more the data values in the set are spread out away from the mean. Variance is an important calculation to become familiar with because, like the arithmetic mean, variance is used in many other more complex statistical evaluations. The calculation of variance is slightly different depending on whether you are working with a population (you do not intend to generalize the results back to a larger group) or a sample (you do intend to use the sample results to predict the results of a larger population). The difference is really only at the end of the process, so let’s start with the calculation of the population. To calculate the variance of a population: 1. First, identify the arithmetic mean of your data by finding the sum of the values and dividing it by the number of values. 2. Next, subtract each value from the mean and record the result. This value is called the deviation of each score from the mean. 3. For each value, square the deviation. 4. Finally, divide the sum of the squared deviations by the number of values in the set. The resulting quotient is the variance \begin{align*}(\sigma^2)\end{align*} of the set. To calculate the variance of a sample, the only difference is that in step 4, you divide the sum of squared deviations by the number of values in the sample minus 1. By dividing the sum of squared deviations by one less than the number of values, you help reduce the effect of outliers in the sample and increase the calculated variance of the sample by a small amount to allow more ‘room’ for the unknown values in the population. #### Calculating the Variance 1. Calculate the variance of set \begin{align*}x\end{align*}: \begin{align*}x=\left \{12, 7, 6, 3, 10, 5, 18, 15\right \}\end{align*} Follow the steps from above to calculate the variance: • First, calculate the arithmetic mean: \begin{align*}\mu =\frac{12+7+6+3+10+5+18+15}{8}=9.5\end{align*} • Subtract each value from the mean to get the deviation of each value, square the deviation of each value:  \begin{align*}\text{Value} - \text{Mean} = \text{Deviation}\end{align*} \begin{align*}\text{Deviation}^2\end{align*} \begin{align*}12-9.5=2.5\end{align*} 6.25 \begin{align*}7-9.5=-2.5\end{align*} 6.25 \begin{align*}6-9.5=-3.5\end{align*} 12.25 \begin{align*}3-9.5=-6.5\end{align*} 42.25 \begin{align*}10-9.5=.5\end{align*} .25 \begin{align*}5-9.5=-4.5\end{align*} 20.25 \begin{align*}18-9.5=8.5\end{align*} 72.25 \begin{align*}15-9.5=5.5\end{align*} 30.25 TOTAL (sum of deviation2): 190.00 • Finally, divide the sum of the squared deviations by the count of values in the data set: \begin{align*}\frac{190}{8} & =23.75\\ \therefore \ The \ variance \ & of \ set \ x \ is \ 23.75\end{align*} 2. Find the variance of set \begin{align*}z\end{align*}: \begin{align*}z=\left \{1, 2, 3, 4, 5, 6, 7, 9\right \}\end{align*} Divide the squared deviation of each value from the mean by the total number of values in the set: \begin{align*}& \qquad \qquad \mu =\frac{1+2+3+4+5+6+7+9}{8}=4.625 \\ &(1-4.625)^2+(2-4.625)^2+(3-4.625)^2+(4-4.625)^2\\ &\qquad +(5-4.625)^2+(6-4.625)^2+(7-4.625)^2+(9-4.625)^2 =49.875 \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \ \frac{49.875}{8} =6.234\\ & \qquad \qquad \qquad \ \therefore \ Variance \ (\sigma^2) \ of \ set \ z = 6.234\end{align*} 3. Find \begin{align*}\sigma^2 \ of \ y\end{align*}: \begin{align*}y=\left \{13, 14, 15, 16, 17, 18, 19, 20, 21\right \}\end{align*} Let’s do this one differently, using a nifty trick known as the “mean of the squares minus the square of the mean.” Start, as before, by finding the arithmetic mean: \begin{align*}\mu =\frac{13+14+15+16+17+18+19+20+21}{9}=17\end{align*} Then, to find the variation, divide the sum of the squares of each value by the number of values (this is the “mean of the squares”), then square the mean we calculated above, 17 (the “square of the mean”), and subtract it from the mean of the squares: \begin{align*}&\sigma ^2 = \frac{13^2+14^2+15^2+16^2+17^2+18^2+19^2+20^2+21^2}{9}-17^2=6.6\overline{6} \\ &\qquad \qquad \qquad \qquad \qquad \quad \therefore \ \sigma ^2 \ of \ y=6.6\overline{6}\end{align*} #### Earlier Problem Revisited If you were told that the mean income at a certain company was35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might skew the mean badly. However, if you were also given the variance of the incomes, how would that help?

By learning the variance of the set of incomes, you could get a feel for how representative the \$35,000 figure was of the likely salary of a common employee.

### Examples

#### Example 1

Find \begin{align*}\mu\end{align*} and \begin{align*}\sigma ^2\end{align*} of set \begin{align*}z\end{align*}

Let’s use the “mean of the squares minus the square of the mean” method:

First find the mean of the set: \begin{align*}\frac{3.25+3.5+2.85+3.4+2.95+3.02+3.17}{7}=3.16286\end{align*}

Now divide the sum of each of the values squared by the number of values:

\begin{align*}\frac{3.25^2+3.5^2+2.85^2+3.4^2+2.95^2+3.02^2+3.17^2}{7}-10.0036=10.0524-10.0036=0.049\end{align*} is the variance.

#### Example 2

If all values of set \begin{align*}z\end{align*}, above, were increased by 5, what would the new mean and variance be?

Find the mean of the new set: \begin{align*}\frac{8.25+8.5+7.85+8.4+7.95+8.02+8.17}{7}=8.16286\end{align*}

Divide the sum of the values squared by the number of values: \begin{align*}\frac{466.7668}{7}=66.681\end{align*}

Subtract the squared mean from the mean of the squares: \begin{align*}66.681-66.632=0.049\end{align*} is the variance.

The variance is the same as before! Does that surprise you? It should, because they actually aren’t the same, it just appears that way due to rounding. The new set actually has a variance closer to 0.048688, and the original is more accurately 0.04873469. Obviously they are very close, but not exactly the same.

#### Example 3

If all values of set \begin{align*}z\end{align*} from question #1 were doubled, how would that affect  \begin{align*}\mu\end{align*} and \begin{align*}\sigma ^2\end{align*}?

The question is what would happen if all of the values were doubled. Do the mean and variance also double? Let’s see:

The mean of the new set is \begin{align*}\frac{6.5+7+5.7+6.8+5.9+6.04+6.34}{7}=\frac{44.28}{7}=6.326\end{align*}, which is twice the mean of the original set. So far so good.

The “mean of the squares” is \begin{align*}\frac{6.5^2+7^2+5.7^2+6.8^2+5.9^2+6.04^2+6.34^2}{7}=\frac{281.47}{7}=40.21\end{align*}, which is four times the original mean of the squares, not double after all (which makes sense, given that each doubled value was squared).

Finally, subtract the two values: \begin{align*}40.21-6.326^2 = .192\end{align*} is the variance. If we compare this to the original: \begin{align*}\frac{.192}{.049}\approx 4\end{align*}, we can see that doubling the original values quadruples the variance.

### Review

Questions 1-12: find \begin{align*}\sigma ^2\end{align*}

1. \begin{align*}y=\left \{4, 50, 63, 2, 82, 99\right \}\end{align*}
2. Set \begin{align*}x\end{align*} is a random sample from a population with 38 members: \begin{align*}x=\left \{8, 13, 5, 10\right \}\end{align*}
3. Set \begin{align*}z\end{align*} is a random sample from a larger population: \begin{align*}z=\left \{4,3,5,15,5\right \}\end{align*}
4. \begin{align*}y=\left \{3,26,5,1,1\right \}\end{align*}
5. 22, 21, 13, 19, 16, 18
6. Sample: 1, 2, 5, 1
7. Sample: 10, 6, 3, 4
8. 8, 11, 17, 7, 19
9. 15, 17, 19, 21, 23, 25, 27, 29
10. Sample: 15, 17, 19, 21, 23, 25, 27, 29
11. .25, .35, .45, .55, .26, .75
12. Find the variance of the data in the table:
 HEIGHTS (rounded to the nearest inch) FREQUENCY OF STUDENTS 60 35 61 33 62 45 63 4 64 3 65 4 66 7 67 4

### Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes

### Vocabulary Language: English

absolute deviation

The absolute deviation is the sum total of how different each number is from the mean.

deviation

Deviation is a measure of the difference between a given value and the mean.

Mean

The mean of a data set is the average of the data set. The mean is found by calculating the sum of the values in the data set and then dividing by the number of values in the data set.

mean absolute deviation

The mean absolute deviation is an alternate measure of how spread out the data is. It involves finding the mean of the distance between each data value and the mean. While this method might seem more intuitive, in statistics it has been found to be too limited and is not commonly used.

Population

In statistics, the population is the entire group of interest from which the sample is drawn.

Sample

A sample is a specified part of a population, intended to represent the population as a whole.

Skew

To skew a given set means to cause the trend of data to favor one end or the other

standard deviation

The square root of the variance is the standard deviation. Standard deviation is one way to measure the spread of a set of data.

variance

A measure of the spread of the data set equal to the mean of the squared variations of each data value from the mean of the data set.