Variance of a Data Set
To calculate the variance \begin{align*}(\sigma^2)\end{align*} for a population of normally distributed data:
Step 1: Determine the mean of the data values.
Step 2: Subtract the mean of the data from each value in the data set to determine the difference between the data value and the mean: \begin{align*}(x-\mu)\end{align*}.
Step 3: Square each of these differences and determine the total of these positive, squared results.
Step 4: Divide this sum by the number of values in the data set.
These steps for calculating the variance of a data set for a population can be summarized in the following formula:
\begin{align*}\sigma^2 = \frac{\sum(x-\mu)^2}{n}\end{align*}
where:
\begin{align*}x\end{align*} is a data value.
\begin{align*}\mu\end{align*} is the population mean.
\begin{align*}n\end{align*} is number of data values (population size).
These steps for calculating the variance of a data set for a sample can be summarized in the following formula:
\begin{align*}s^2 = \frac{\sum(x-\overline{x})^2}{n-1}\end{align*}
where:
\begin{align*}x\end{align*} is a data value.
\begin{align*}\overline{x}\end{align*} is the sample mean.
\begin{align*}n\end{align*} is number of data values (sample size).
The only difference in the formulas is the number by which the sum is divided. For a population, it is divided by \begin{align*}n\end{align*}, and for a sample, it is divided by \begin{align*}n - 1\end{align*}.
Calculating the Variance of a Data Set
1. A company wants to test its exterior house paint to determine how long it will retain its original color before fading. The company mixes 2 brands of paint by adding different chemicals to each brand. 6 one-gallon cans are made for each paint brand, and the results are recorded for every gallon of each brand of paint. The following are the results obtained in the laboratory. Calculate the variance of the 2 brands of paint. These are both small populations.
Brand A (Time in months) | Brand B (Time in months) |
---|---|
15 | 40 |
65 | 50 |
55 | 35 |
35 | 40 |
45 | 45 |
25 | 30 |
Brand A
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
15 | \begin{align*}-25\end{align*} | 625 |
65 | 25 | 625 |
55 | 15 | 225 |
35 | \begin{align*}-5\end{align*} | 25 |
45 | 5 | 25 |
25 | \begin{align*}-15\end{align*} | 225 |
\begin{align*}\mu&=\frac{15+65+55+35+45+25}{6}=\frac{240}{6}=40\\ \sigma^2 & = \frac{\sum (x- \mu)^2}{n}\\ \sigma^2 & = \frac{625+625+225+25+25+225}{6} = \frac{1,750}{6} \approx 291.\overline{66}\end{align*}
Brand B
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
40 | 0 | 0 |
50 | 10 | 100 |
35 | \begin{align*}-5\end{align*} | 25 |
40 | 0 | 0 |
45 | 5 | 25 |
30 | \begin{align*}-10\end{align*} | 100 |
\begin{align*}\mu&=\frac{40+50+35+40+45+30}{6}=\frac{240}{6}=40\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n}\\ \sigma^2 & = \frac{0+100+25+0+25+100}{6} = \frac{250}{6} \approx 41.\overline{66}\end{align*}
The variance is simply the average of the squares of the distance of each data value from the mean. If these data values are close to the value of the mean, the variance will be small. This was the case for Brand B. If these data values are far from the mean, the variance will be large, as was the case for Brand A.
The variance of a data set is always a positive value.
2. What would the variances of the 2 data sets in the previous example have been had they been samples instead of small populations?
First, let's calculate the variance of the data set for Brand A had it been a sample:
\begin{align*}\bar{x}&=\frac{15+65+55+35+45+25}{6}=\frac{240}{6}=40\\ s^2 & = \frac{\sum (x- \bar{x})^2}{n-1}\\ s^2 & = \frac{625+625+225+25+25+225}{5} = \frac{1,750}{5}=350\end{align*}
Next, let's calculate the variance of the data set for Brand B had it been a sample:
\begin{align*}\bar{x}&=\frac{40+50+35+40+45+30}{6}=\frac{240}{6}=40\\ s^2 & = \frac{\sum (x-\bar{x})^2}{n-1}\\ s^2 & = \frac{0+100+25+0+25+100}{5} = \frac{250}{5}=50\end{align*}
Notice that, as in the previous example, the variance of the data set for Brand A is much larger than the variance of the data set for Brand B.
3. The following data represents the morning temperatures \begin{align*}(^\circ \text{C})\end{align*} and the monthly rainfall (mm) in July for all the Canadian cities east of Toronto:
Temperature \begin{align*}(^\circ \text{C})\end{align*}
\begin{align*}& 11.7 \quad 13.7 \quad 10.5 \quad \ 14.2 \quad 13.9 \quad 14.2 \quad 10.4 \quad 16.1 \quad 16.4\\ & 4.8 \quad \ \ 15.2 \quad 13.0 \quad \ 14.4 \quad 12.7 \quad 8.6 \quad \ 12.9 \quad 11.5 \quad 14.6\end{align*}
Precipitation (mm)
\begin{align*}& 18.6 \quad 37.1 \quad 70.9 \quad \ 102 \quad \ 59.9 \quad 58.0 \quad 73.0 \quad 77.6 \quad \ 89.1\\ & 86.6 \quad 40.3 \quad 119.5 \quad 36.2 \quad 85.5 \quad 59.2 \quad 97.8 \quad 122.2 \quad 82.6\end{align*}
Calculate the variance for each data set. Which data set is more variable? Both are small populations.
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
11.7 | \begin{align*}-1\end{align*} | 1 |
13.7 | 1 | 1 |
10.5 | \begin{align*}-2.2\end{align*} | 4.84 |
14.2 | 1.5 | 2.25 |
13.9 | 1.2 | 1.44 |
14.2 | 1.5 | 2.25 |
10.4 | \begin{align*}-2.3\end{align*} | 5.29 |
16.1 | 3.4 | 11.56 |
16.4 | 3.7 | 13.69 |
4.8 | \begin{align*}-7.9\end{align*} | 62.41 |
15.2 | 2.5 | 6.25 |
13.0 | 0.3 | 0.09 |
14.4 | 1.7 | 2.89 |
12.7 | 0 | 0 |
8.6 | \begin{align*}-4.1\end{align*} | 16.81 |
12.9 | 0.2 | 0.04 |
11.5 | \begin{align*}-1.2\end{align*} | 1.44 |
14.6 | 1.9 | 3.61 |
\begin{align*}\mu&= \frac{\sum x}{n} = \frac{228.6}{18} \approx 12.7\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n}\\ \sigma^2 & = \frac{136.86}{18} \approx 7.6\end{align*}
The variance of the data set is approximately \begin{align*}7.6 \ ^\circ \text{C}\end{align*}.
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
18.6 | \begin{align*}-54.5\end{align*} | 2970.3 |
37.1 | \begin{align*}-36.0\end{align*} | 1296 |
70.9 | \begin{align*}-2.2\end{align*} | 4.84 |
102.0 | 28.9 | 835.21 |
59.9 | \begin{align*}-13.2\end{align*} | 174.24 |
58.0 | \begin{align*}-15.1\end{align*} | 228.01 |
73.0 | \begin{align*}-0.1\end{align*} | 0.01 |
77.6 | 4.5 | 20.25 |
89.1 | 16.0 | 256 |
86.6 | 13.5 | 182.25 |
40.3 | \begin{align*}-32.8\end{align*} | 1075.8 |
119.5 | 46.4 | 2153 |
36.2 | \begin{align*}-36.9\end{align*} | 1361.6 |
85.5 | 12.4 | 153.76 |
59.2 | \begin{align*}-13.9\end{align*} | 193.21 |
97.8 | 24.7 | 610.09 |
122.2 | 49.1 | 2410.8 |
82.6 | 9.5 | 90.25 |
\begin{align*}\mu&= \frac{\sum x}{n} = \frac{1,316.1}{18} \approx 73.1\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n}\\ \sigma^2 & = \frac{14,016}{18} \approx 778.\overline{66}\end{align*}
The variance of the data set is approximately 778.66 mm.
Therefore, the data values for the precipitation are more variable. This is indicated by the large variance of the data set.
-->
Example
Example 1
A consumer advocacy magazine wants to compare 2 brands of incandescent lamps. The magazine took samples of each brand, with each sample consisting of 10 lamps. All of the lamps in both of the samples were allowed to burn as long as they could, and the times were recorded in hours. The following are the results obtained from the magazine. Calculate the variance of the samples of the 2 brands of incandescent lamps. Which brand has the more variable burning times?
Brand A (Time in hours) | Brand B (Time in hours) |
---|---|
760 | 820 |
790 | 900 |
800 | 810 |
780 | 790 |
850 | 810 |
790 | 800 |
750 | 850 |
820 | 820 |
810 | 920 |
800 | 890 |
Brand A
\begin{align*}x\end{align*} | \begin{align*}(x-\bar{x})\end{align*} | \begin{align*}(x-\bar{x})^2\end{align*} |
---|---|---|
760 | \begin{align*}-35\end{align*} | 1,225 |
790 | \begin{align*}-5\end{align*} | 25 |
800 | 5 | 25 |
780 | \begin{align*}-15\end{align*} | 225 |
850 | 55 | 3,025 |
790 | \begin{align*}-5\end{align*} | 25 |
750 | \begin{align*}-45\end{align*} | 2,025 |
820 | 25 | 625 |
810 | 15 | 225 |
800 | 5 | 25 |
\begin{align*}\bar{x}&=\frac{760+790+800+780+850+790+750+820+810+800}{10}=\frac{7,950}{10}=795\\ s^2 & = \frac{\sum (x- \bar{x})^2}{n-1}\\ s^2 & = \frac{1,225+25+25+225+3,025+25+2,025+625+225+25}{9} = \frac{7,450}{9} \approx 827.\overline{77}\end{align*}
The variance of the burning times for Brand A is approximately 827.78 hours.
Brand B
\begin{align*}x\end{align*} | \begin{align*}(x-\bar{x})\end{align*} | \begin{align*}(x-\bar{x})^2\end{align*} |
---|---|---|
820 | \begin{align*}-21\end{align*} | 441 |
900 | 59 | 3,481 |
810 | \begin{align*}-31\end{align*} | 961 |
790 | \begin{align*}-51\end{align*} | 2,601 |
810 | \begin{align*}-31\end{align*} | 961 |
800 | \begin{align*}-41\end{align*} | 1,681 |
850 | 9 | 81 |
820 | \begin{align*}-21\end{align*} | 441 |
920 | 79 | 6,241 |
890 | 49 | 2,401 |
\begin{align*}\bar{x}&=\frac{820+900+810+790+810+800+850+820+920+890}{10}=\frac{8,410}{10}=841\\ s^2 & = \frac{\sum (x-\bar{x})^2}{n-1}\\ s^2 & = \frac{441+3,481+961+2,601+961+1,681+81+441+6,241+2,401}{9} = \frac{19,290}{9} \approx 2,143.\overline{33}\end{align*}
The variance of the burning times for Brand B is approximately 2,143.33 hours. Therefore, Brand B has the more variable burning times.
Review
- The following data was collected: \begin{align*}5 \qquad 8 \qquad 9 \qquad 10 \qquad 4 \qquad 3 \qquad 7 \qquad 5\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*} | Mean \begin{align*}(\mu)\end{align*} | Data \begin{align*}-\end{align*} Mean \begin{align*}(x - \mu)\end{align*} | Square of Data \begin{align*}-\end{align*} Mean \begin{align*}(x-\mu)^2\end{align*} | |
---|---|---|---|---|
\begin{align*}\sum\end{align*} |
- What would the variance have been for question 1 had the data set represented a sample instead of a small population?
- The following data was collected. \begin{align*}11 \qquad 15 \qquad 16 \qquad 12 \qquad 19 \qquad 17 \qquad 14 \qquad 18 \qquad 15 \qquad 10\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*} | Mean \begin{align*}(\mu)\end{align*} | Data \begin{align*}-\end{align*} Mean \begin{align*}(x - \mu)\end{align*} | Square of Data \begin{align*}-\end{align*} Mean \begin{align*}(x-\mu)^2\end{align*} | |
---|---|---|---|---|
\begin{align*}\sum\end{align*} |
- What would the variance have been for question 3 had the data set represented a sample instead of a small population?
- The following data was collected. \begin{align*}55 \qquad 54 \qquad 48 \qquad 52 \qquad 69 \qquad 60 \qquad 47 \qquad 66 \qquad 50 \qquad 61\end{align*} Fill in the chart below and calculate the variance. The data represents a small population.
Data \begin{align*}(x)\end{align*} | Mean \begin{align*}(\mu)\end{align*} | Data \begin{align*}-\end{align*} Mean \begin{align*}(x - \mu)\end{align*} | Square of Data \begin{align*}-\end{align*} Mean \begin{align*}(x-\mu)^2\end{align*} | |
---|---|---|---|---|
\begin{align*}\sum\end{align*} |
- What would the variance have been for question 5 had the data set represented a sample instead of a small population?
- The following data was collected: \begin{align*}26 \qquad 30 \qquad 20 \qquad 27 \qquad 23 \qquad 33 \qquad 19 \qquad 26\end{align*} Fill in the chart below and calculate the variance. The data represents a sample.
Data \begin{align*}(x)\end{align*} | Mean \begin{align*}(\bar{x})\end{align*} | Data \begin{align*}-\end{align*} Mean \begin{align*}(x - \bar{x})\end{align*} | Square of Data \begin{align*}-\end{align*} Mean \begin{align*}(x-\bar{x})^2\end{align*} | |
---|---|---|---|---|
\begin{align*}\sum\end{align*} |
- What would the variance have been for question 7 had the data set represented a small population instead of a sample?
- The following data was collected: \begin{align*}85 \qquad 99 \qquad 89 \qquad 90 \qquad 104 \qquad 82 \qquad 95 \qquad 110\end{align*} Fill in the chart below and calculate the variance. The data represents a sample.
Data \begin{align*}(x)\end{align*} | Mean \begin{align*}(\bar{x})\end{align*} | Data \begin{align*}-\end{align*} Mean \begin{align*}(x - \bar{x})\end{align*} | Square of Data \begin{align*}-\end{align*} Mean \begin{align*}(x-\bar{x})^2\end{align*} | |
---|---|---|---|---|
\begin{align*}\sum\end{align*} |
- What would the variance have been for question 9 had the data set represented a small population instead of a sample?
Review (Answers)
To view the Review answers, open this PDF file and look for section 6.2.