<meta http-equiv="refresh" content="1; url=/nojavascript/"> Variance of a Data Set ( Read ) | Statistics | CK-12 Foundation
Dismiss
Skip Navigation

Variance of a Data Set

%
Progress
Practice Variance of a Data Set
Practice
Progress
%
Practice Now
Calculating Variance

Objective

Here you will learn about variance , a measure of the clustering or spread of values around the mean of a data set or population.

Concept

If you were told that the mean income at a certain company was $35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might skew the mean badly. However, if you were also given the variance of the incomes, how would that help?

Watch This

http://youtu.be/6JFzI1DDyyk Khan Academy – Variance of a population

Guidance

Variance (commonly denoted \sigma ^2 ) is a very useful measure of the relative amount of ‘scattering’ of a given set. In other words, knowing the variance can give you an idea of how closely the values in a set cluster around the mean. The greater the variance, the more the data values in the set are spread out away from the mean.

Variance is an important calculation to become familiar with because, like the arithmetic mean, variance is used in many other more complex statistical evaluations. The calculation of variance is slightly different depending on whether you are working with a population (you do not intend to generalize the results back to a larger group) or a sample (you do intend to use the sample results to predict the results of a larger population). The difference is really only at the end of the process, so let’s start with the calculation of the population.

To calculate the variance of a population:

  1. First, identify the arithmetic mean of your data by finding the sum of the values and dividing it by the number of values.
  2. Next, subtract each value from the mean and record the result. This value is called the deviation of each score from the mean.
  3. For each value, square the deviation .
  4. Finally, divide the sum of the squared deviations by the number of values in the set. The resulting quotient is the variance (\sigma^2) of the set.

To calculate the variance of a sample, the only difference is that in step 4, you divide the sum of squared deviations by the number of values in the sample minus 1 . By dividing the sum of squared deviations by one less than the number of values, you help reduce the effect of outliers in the sample and increase the calculated variance of the sample by a small amount to allow more ‘room’ for the unknown values in the population.

Example A

Calculate the variance of set x :

x=\left \{12, 7, 6, 3, 10, 5, 18, 15\right \}

Solution : Follow the steps from above to calculate the variance:

  • First, calculate the arithmetic mean:

\mu =\frac{12+7+6+3+10+5+18+15}{8}=9.5

  • Subtract each value from the mean to get the deviation of each value, square the deviation of each value:
\text{Value} - \text{Mean} = \text{Deviation} \text{Deviation}^2
12-9.5=2.5 6.25
7-9.5=-2.5 6.25
6-9.5=-3.5 12.25
3-9.5=-6.5 42.25
10-9.5=.5 .25
5-9.5=-4.5 20.25
18-9.5=8.5 72.25
15-9.5=5.5 30.25
TOTAL (sum of deviation 2 ): 190.00
  • Finally, divide the sum of the squared deviations by the count of values in the data set:

\frac{190}{8} & =23.75\\\therefore \ The \ variance \ & of \ set \ x \ is \ 23.75

Example B

Find the variance of set z :

z=\left \{1, 2, 3, 4, 5, 6, 7, 9\right \}

Solution: Divide the squared deviation of each value from the mean by the total number of values in the set:

& \qquad \qquad  \mu =\frac{1+2+3+4+5+6+7+9}{8}=4.625 \\&(1-4.625)^2+(2-4.625)^2+(3-4.625)^2+(4-4.625)^2\\&\qquad +(5-4.625)^2+(6-4.625)^2+(7-4.625)^2+(9-4.625)^2 =49.875 \\& \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \ \frac{49.875}{8} =6.234\\& \qquad \qquad \qquad \ \therefore \ Variance \ (\sigma^2) \ of \ set \ z = 6.234

Example C

Find \sigma^2 \ of \ y :

y=\left \{13, 14, 15, 16, 17, 18, 19, 20, 21\right \}

Solution: Let’s do this one differently, using a nifty trick known as the “mean of the squares minus the square of the mean.” Start, as before, by finding the arithmetic mean:

\mu =\frac{13+14+15+16+17+18+19+20+21}{9}=17

Then, to find the variation, divide the sum of the squares of each value by the number of values (this is the “mean of the squares”), then square the mean we calculated above, 17 (the “square of the mean”), and subtract it from the mean of the squares:

&\sigma ^2 = \frac{13^2+14^2+15^2+16^2+17^2+18^2+19^2+20^2+21^2}{9}-17^2=6.6\overline{6} \\&\qquad \qquad \qquad \qquad \qquad \quad \therefore \ \sigma ^2 \ of \ y=6.6\overline{6}

Concept Problem Revisited

If you were told that the mean income at a certain company was $35,000, you wouldn’t really know much about the actual income of the majority of the employees, since there could be a few upper-level managers or owners whose income might skew the mean badly. However, if you were also given the variance of the incomes, how would that help?

By learning the variance of the set of incomes, you could get a feel for how representative the $35,000 figure was of the likely salary of a common employee.

Vocabulary

To skew a given set means to cause the trend of data to favor one end or the other.

The variance (symbolized by \sigma ^2 ) of a set is a measure of the average clustering of data points around the mean.

Deviation is a measure of the difference between a given value and the mean.

Guided Practice

1. Find  \mu and  \sigma ^2   of set z .

z=\left \{3.25, 3.5, 2.85, 3.4, 2.95, 3.02, 3.17\right \}

2. If all values of set z , above, were increased by 5, what would the new mean and variance be?

3. If all values of set z from question #1 were doubled, how would that affect  \mu and \sigma ^2 ?

Solutions:

1. Let’s use the “mean of the squares minus the square of the mean” method:

First find the mean of the set: \frac{3.25+3.5+2.85+3.4+2.95+3.02+3.17}{7}=3.16286

Now divide the sum of each of the values squared by the number of values:

\frac{3.25^2+3.5^2+2.85^2+3.4^2+2.95^2+3.02^2+3.17^2}{7}-10.0036=10.0524-10.0036=0.049   is the variance.

2. Find the mean of the new set: \frac{8.25+8.5+7.85+8.4+7.95+8.02+8.17}{7}=8.16286

Divide the sum of the values squared by the number of values: \frac{466.7668}{7}=66.681

Subtract the squared mean from the mean of the squares: 66.681-66.632=0.049   is the variance.

The variance is the same as before! Does that surprise you? It should, because they actually aren’t the same, it just appears that way due to rounding. The new set actually has a variance closer to 0.048688, and the original is more accurately 0.04873469. Obviously they are very close, but not exactly the same.

3. The question is what would happen if all of the values were doubled. Do the mean and variance also double? Let’s see:

The mean of the new set is \frac{6.5+7+5.7+6.8+5.9+6.04+6.34}{7}=\frac{44.28}{7}=6.326 , which is twice the mean of the original set. So far so good.

The “mean of the squares” is \frac{6.5^2+7^2+5.7^2+6.8^2+5.9^2+6.04^2+6.34^2}{7}=\frac{281.47}{7}=40.21 , which is four times the original mean of the squares, not double after all (which makes sense, given that each doubled value was squared).

Finally, subtract the two values: 40.21-6.326^2 = .192   is the variance. If we compare this to the original: \frac{.192}{.049}\approx 4 , we can see that doubling the original values quadruples the variance.

Practice

Questions 1-12: find \sigma ^2

  1. y=\left \{4, 50, 63, 2, 82, 99\right \}
  2. Set  x is a random sample from a population with 38 members: x=\left \{8, 13, 5, 10\right \}
  3. Set  z is a random sample from a larger population: z=\left \{4,3,5,15,5\right \}
  4. y=\left \{3,26,5,1,1\right \}
  5. 22, 21, 13, 19, 16, 18
  6. Sample: 1, 2, 5, 1
  7. Sample: 10, 6, 3, 4
  8. 8, 11, 17, 7, 19
  9. 15, 17, 19, 21, 23, 25, 27, 29
  10. Sample: 15, 17, 19, 21, 23, 25, 27, 29
  11. .25, .35, .45, .55, .26, .75
  12. Find the variance of the data in the table:
HEIGHTS (rounded to the nearest inch) FREQUENCY OF STUDENTS
60 35
61 33
62 45
63 4
64 3
65 4
66 7
67 4

Image Attributions

Explore More

Sign in to explore more, including practice questions and solutions for Variance of a Data Set.

Reviews

Please wait...
Please wait...

Original text