6.2: Variance of a Data Set
Suppose you are a toy maker and you're testing the efficiency of a new factory process. In the past, the factory was able to produce an average of 5,500 toys per day. You take a sample of the production runs over a week's worth of time. From Monday through Friday here are the production numbers: 6,500; 2,500; 5,700; 8,000; 3,800. How can you express the differences between the efficiency of this process compared to the previous process? How can you describe the variance?
Watch This
First watch this video to learn about the variance of normally distributed data.
CK12 Foundation: Chapter6VarianceofNormallyDistributedDataA
Then watch this video to see some examples.
CK12 Foundation: Chapter6VarianceofNormallyDistributedDataB
Watch this video for more help. This video from Khan Academy shows how to calculate the variance and the standard deviation of a population and a sample.
Khan Academy Statistics: Standard Deviation
Guidance
To calculate the variance
Step 1: Determine the mean of the data values.
Step 2: Subtract the mean of the data from each value in the data set to determine the difference between the data value and the mean:
Step 3: Square each of these differences and determine the total of these positive, squared results.
Step 4: Divide this sum by the number of values in the data set.
These steps for calculating the variance of a data set for a population can be summarized in the following formula:
where:
These steps for calculating the variance of a data set for a sample can be summarized in the following formula:
where:
The only difference in the formulas is the number by which the sum is divided. For a population, it is divided by
Example A
A company wants to test its exterior house paint to determine how long it will retain its original color before fading. The company mixes 2 brands of paint by adding different chemicals to each brand. 6 onegallon cans are made for each paint brand, and the results are recorded for every gallon of each brand of paint. The following are the results obtained in the laboratory. Calculate the variance of the 2 brands of paint. These are both small populations.
Brand A (Time in months)  Brand B (Time in months) 

15  40 
65  50 
55  35 
35  40 
45  45 
25  30 
Brand A




15 

625 
65  25  625 
55  15  225 
35 

25 
45  5  25 
25 

225 
Brand B




40  0  0 
50  10  100 
35 

25 
40  0  0 
45  5  25 
30 

100 
The variance is simply the average of the squares of the distance of each data value from the mean. If these data values are close to the value of the mean, the variance will be small. This was the case for Brand B. If these data values are far from the mean, the variance will be large, as was the case for Brand A.
The variance of a data set is always a positive value.
Example B
What would the variances of the 2 data sets in Example A have been had they been samples instead of small populations?
First, let's calculate the variance of the data set for Brand A had it been a sample:
Next, let's calculate the variance of the data set for Brand B had it been a sample:
Notice that, as in Example A, the variance of the data set for Brand A is much larger than the variance of the data set for Brand B.
Example C
The following data represents the morning temperatures
Temperature
Precipitation (mm)
Calculate the variance for each data set. Which data set is more variable? Both are small populations.
Temperature




11.7 

1 
13.7  1  1 
10.5 

4.84 
14.2  1.5  2.25 
13.9  1.2  1.44 
14.2  1.5  2.25 
10.4 

5.29 
16.1  3.4  11.56 
16.4  3.7  13.69 
4.8 

62.41 
15.2  2.5  6.25 
13.0  0.3  0.09 
14.4  1.7  2.89 
12.7  0  0 
8.6 

16.81 
12.9  0.2  0.04 
11.5 

1.44 
14.6  1.9  3.61 
The variance of the data set is approximately
Precipitation (mm)




18.6 

2970.3 
37.1 

1296 
70.9 

4.84 
102.0  28.9  835.21 
59.9 

174.24 
58.0 
\begin{align*}15.1\end{align*} 
228.01 
73.0  \begin{align*}0.1\end{align*}  0.01 
77.6  4.5  20.25 
89.1  16.0  256 
86.6  13.5  182.25 
40.3  \begin{align*}32.8\end{align*}  1075.8 
119.5  46.4  2153 
36.2  \begin{align*}36.9\end{align*}  1361.6 
85.5  12.4  153.76 
59.2  \begin{align*}13.9\end{align*}  193.21 
97.8  24.7  610.09 
122.2  49.1  2410.8 
82.6  9.5  90.25 
\begin{align*}\mu&= \frac{\sum x}{n} = \frac{1,316.1}{18} \approx 73.1\\ \sigma^2 & = \frac{\sum (x\mu)^2}{n}\\ \sigma^2 & = \frac{14,016}{18} \approx 778.\overline{66}\end{align*}
The variance of the data set is approximately 778.66 mm.
Therefore, the data values for the precipitation are more variable. This is indicated by the large variance of the data set.
Guided Practice
A consumer advocacy magazine wants to compare 2 brands of incandescent lamps. The magazine took samples of each brand, with each sample consisting of 10 lamps. All of the lamps in both of the samples were allowed to burn as long as they could, and the times were recorded in hours. The following are the results obtained from the magazine. Calculate the variance of the samples of the 2 brands of incandescent lamps. Which brand has the more variable burning times?
Brand A (Time in hours)  Brand B (Time in hours) 

760  820 
790  900 
800  810 
780  790 
850  810 
790  800 
750  850 
820  820 
810  920 
800  890 
Answer:
Brand A
\begin{align*}x\end{align*}  \begin{align*}(x\bar{x})\end{align*}  \begin{align*}(x\bar{x})^2\end{align*} 

760  \begin{align*}35\end{align*}  1,225 
790  \begin{align*}5\end{align*}  25 
800  5  25 
780  \begin{align*}15\end{align*}  225 
850  55  3,025 
790  \begin{align*}5\end{align*}  25 
750  \begin{align*}45\end{align*}  2,025 
820  25  625 
810  15  225 
800  5  25 
\begin{align*}\bar{x}&=\frac{760+790+800+780+850+790+750+820+810+800}{10}=\frac{7,950}{10}=795\\ s^2 & = \frac{\sum (x \bar{x})^2}{n1}\\ s^2 & = \frac{1,225+25+25+225+3,025+25+2,025+625+225+25}{9} = \frac{7,450}{9} \approx 827.\overline{77}\end{align*}
The variance of the burning times for Brand A is approximately 827.78 hours.
Brand B
\begin{align*}x\end{align*}  \begin{align*}(x\bar{x})\end{align*}  \begin{align*}(x\bar{x})^2\end{align*} 

820  \begin{align*}21\end{align*}  441 
900  59  3,481 
810  \begin{align*}31\end{align*}  961 
790  \begin{align*}51\end{align*}  2,601 
810  \begin{align*}31\end{align*}  961 
800  \begin{align*}41\end{align*}  1,681 
850  9  81 
820  \begin{align*}21\end{align*}  441 
920  79  6,241 
890  49  2,401 
\begin{align*}\bar{x}&=\frac{820+900+810+790+810+800+850+820+920+890}{10}=\frac{8,410}{10}=841\\ s^2 & = \frac{\sum (x\bar{x})^2}{n1}\\ s^2 & = \frac{441+3,481+961+2,601+961+1,681+81+441+6,241+2,401}{9} = \frac{19,290}{9} \approx 2,143.\overline{33}\end{align*}
The variance of the burning times for Brand B is approximately 2,143.33 hours. Therefore, Brand B has the more variable burning times.
Practice
 The following data was collected: \begin{align*}5 \qquad 8 \qquad 9 \qquad 10 \qquad 4 \qquad 3 \qquad 7 \qquad 5\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*}  Mean \begin{align*}(\mu)\end{align*}  Data \begin{align*}\end{align*} Mean \begin{align*}(x  \mu)\end{align*}  Square of Data \begin{align*}\end{align*} Mean \begin{align*}(x\mu)^2\end{align*}  

\begin{align*}\sum\end{align*} 
 What would the variance have been for question 1 had the data set represented a sample instead of a small population?
 The following data was collected. \begin{align*}11 \qquad 15 \qquad 16 \qquad 12 \qquad 19 \qquad 17 \qquad 14 \qquad 18 \qquad 15 \qquad 10\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*}  Mean \begin{align*}(\mu)\end{align*}  Data \begin{align*}\end{align*} Mean \begin{align*}(x  \mu)\end{align*}  Square of Data \begin{align*}\end{align*} Mean \begin{align*}(x\mu)^2\end{align*}  

\begin{align*}\sum\end{align*} 
 What would the variance have been for question 3 had the data set represented a sample instead of a small population?
 The following data was collected. \begin{align*}55 \qquad 54 \qquad 48 \qquad 52 \qquad 69 \qquad 60 \qquad 47 \qquad 66 \qquad 50 \qquad 61\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*}  Mean \begin{align*}(\mu)\end{align*}  Data \begin{align*}\end{align*} Mean \begin{align*}(x  \mu)\end{align*}  Square of Data \begin{align*}\end{align*} Mean \begin{align*}(x\mu)^2\end{align*}  

\begin{align*}\sum\end{align*} 
 What would the variance have been for question 5 had the data set represented a sample instead of a small population?
 The following data was collected: \begin{align*}26 \qquad 30 \qquad 20 \qquad 27 \qquad 23 \qquad 33 \qquad 19 \qquad 26\end{align*} Fill in the chart below and calculate the variance. The data represents a sample.
Data \begin{align*}(x)\end{align*}  Mean \begin{align*}(\bar{x})\end{align*}  Data \begin{align*}\end{align*} Mean \begin{align*}(x  \bar{x})\end{align*}  Square of Data \begin{align*}\end{align*} Mean \begin{align*}(x\bar{x})^2\end{align*}  

\begin{align*}\sum\end{align*} 
 What would the variance have been for question 7 had the data set represented a small population instead of a sample?
 The following data was collected: \begin{align*}85 \qquad 99 \qquad 89 \qquad 90 \qquad 104 \qquad 82 \qquad 95 \qquad 110\end{align*} Fill in the chart below and calculate the variance. The data represents a sample.
Data \begin{align*}(x)\end{align*}  Mean \begin{align*}(\bar{x})\end{align*}  Data \begin{align*}\end{align*} Mean \begin{align*}(x  \bar{x})\end{align*}  Square of Data \begin{align*}\end{align*} Mean \begin{align*}(x\bar{x})^2\end{align*}  

\begin{align*}\sum\end{align*} 
 What would the variance have been for question 9 had the data set represented a small population instead of a sample?
Notes/Highlights Having trouble? Report an issue.
Color  Highlighted Text  Notes  

Please Sign In to create your own Highlights / Notes  
Show More 
Term  Definition 

variance  A measure of the spread of the data set equal to the mean of the squared variations of each data value from the mean of the data set. 
absolute deviation  The absolute deviation is the sum total of how different each number is from the mean. 
deviation  Deviation is a measure of the difference between a given value and the mean. 
Mean  The mean of a data set is the average of the data set. The mean is found by calculating the sum of the values in the data set and then dividing by the number of values in the data set. 
mean absolute deviation  The mean absolute deviation is an alternate measure of how spread out the data is. It involves finding the mean of the distance between each data value and the mean. While this method might seem more intuitive, in statistics it has been found to be too limited and is not commonly used. 
Population  In statistics, the population is the entire group of interest from which the sample is drawn. 
Sample  A sample is a specified part of a population, intended to represent the population as a whole. 
Skew  To skew a given set means to cause the trend of data to favor one end or the other 
standard deviation  The square root of the variance is the standard deviation. Standard deviation is one way to measure the spread of a set of data. 
Image Attributions
Here you'll learn the meaning of variance and calculate the variance for populations and for samples.