Suppose you were given a histogram and asked to find the variance of the data it illustrates? Would you know how?
After this lesson, you will understand how to compare visualized data with variance.
Comparing Visualized Data with Variance
Knowing how to calculate the variance of a set when it is given to you as a list of values is great, but statistical data is often shared and disseminated in visual form rather than as raw data. Because of this, it is important to practice evaluating the variance of graphed data as well as tabular or raw data so you can actually apply your understanding of variance to real-world statistics.
In general, you will need to:
- Identify the values of the dependent variable, as these are the values you will be finding the variance of.
- Sum the values and calculate the arithmetic mean.
- Subtract the mean from each value to find the deviation and square the deviation
- Sum the squared deviations and divide the total by the count of values in the data set, the result is the variance.
Finding the Mean and Variance
Find the \begin{align*}\mu\end{align*} and \begin{align*}\sigma^2\end{align*} of the number of students in each classroom at Toni’s school:
Classroom | Number of Students |
A | 6 |
B | 5 |
C | 9 |
D | 13 |
E | 12 |
F | 16 |
G | 14 |
Follow the steps from above to find mean and variance of the students:
1. The frequency of students in each classroom is the dependent variable.
2. There are 7 values, listed in ascending order they are: 5, 6, 9, 12, 13, 14, and 16.
3. The sum of the values is: \begin{align*}5+6+9+12+13+14+16=75\end{align*}, the mean is \begin{align*}\frac{75}{7}=10.714\end{align*}.
4. The deviances and squared deviances are:
\begin{align*}\text{Value} - \text{Mean} = \text{Deviance}\end{align*} | \begin{align*}\text{Deviance}^2\end{align*} |
\begin{align*}5-10.714=-5.714\end{align*} | 32.65 |
\begin{align*}6-10.714=-4.714\end{align*} | 22.22 |
\begin{align*}9-10.714=-1.714\end{align*} | 2.94 |
\begin{align*}12-10.714=1.286\end{align*} | 1.654 |
\begin{align*}13-10.714=2.286\end{align*} | 5.226 |
\begin{align*}14-10.714=3.286\end{align*} | 10.798 |
\begin{align*}16-10.714=5.286\end{align*} | 27.942 |
5. The sum of the squared deviances is 103.43. The variance is \begin{align*}\frac{103.43}{7}=14.776\end{align*}
Finding the Mean and Variance of Graphed Data
Find the \begin{align*}\mu\end{align*} and \begin{align*}\sigma^2\end{align*} of the graphed data.
Follow the steps outlined above:
1. Most often, the dependent variable is represented by the vertical axis, and this histogram is no exception. The number of 4.0’s each year is the dependent variable, while the year is the independent variable.
2. In ascending order, the dependent variable values are:
\begin{align*}39, 45, 47, 51, 51, 54, 54, 56\end{align*}
3. The sum of the values is: \begin{align*}39+45+47+51+51+54+54+56=397\end{align*}.
The mean (μ) is: \begin{align*}\frac{397}{8}=49.625\end{align*} which suggests that a year with 50 or more 4.0 GPA’s would be considered an above average year.
4. The deviation and squared deviation of each value is:
Deviance | Deviance^{2} |
\begin{align*}39-49.625=-10.625\end{align*} | \begin{align*}(-10.625)^2=112.89\end{align*} |
\begin{align*}45-49.625=-4.625\end{align*} | \begin{align*}(-4.625)^2=21.39\end{align*} |
\begin{align*}47-49.625=-2.625\end{align*} | \begin{align*}(-2.625)^2=6.89\end{align*} |
\begin{align*}51-49.625=1.375\end{align*} | \begin{align*}(1.375)^2=1.89\end{align*} |
\begin{align*}51-49.625=1.375\end{align*} | \begin{align*}(-10.625)^2=112.89\end{align*} |
\begin{align*}54-49.625=4.375\end{align*} | \begin{align*}(4.375)^2=19.14\end{align*} |
\begin{align*}54-49.625=4.375\end{align*} | \begin{align*}(4.375)^2=19.14\end{align*} |
\begin{align*}56-49.625=6.375\end{align*} | \begin{align*}(6.375)^2=40.64\end{align*} |
5. The sum of the squared deviances is 334.87, making the variance \begin{align*}\frac{334.87}{8}=41.86\end{align*}.
\begin{align*}\therefore \sigma^2=41.86 \end{align*}
Interpreting Frequency Polygons
Based on the data in the frequency polygon, which year had the greatest variance in number of shoe brands at various prices, and which had the least variance?
Each of the three data sets contains 6 values, and the mean of each set is:
- 2008: Sum: \begin{align*}19+25+21+25+9+6=105\end{align*} Mean: \begin{align*}\frac{105}{6}=17.5\end{align*}
- 2007: Sum: \begin{align*}16+19+17+19+7+3=81\end{align*} Mean: \begin{align*}\frac{81}{6}=13.5\end{align*}
- 2006: Sum: \begin{align*}14+17+16+15+6+3=71\end{align*} Mean: \begin{align*}\frac{71}{6}=11.83\end{align*}
The sum of the squared deviances for each year is:
- 2008: \begin{align*}(19-17.5)^2+(25-17.5)^2+(21-17.5)^2+(25-17.5)^2+(9-17.5)^2+(6-17.5)^2=331.5\end{align*}
- 2007: \begin{align*}(16-13.5)^2+(19-13.5)^2+(17-13.5)^2+(19-13.5)^2+(7-13.5)^2+(3-13.5)^2=231.5\end{align*}
- 2006: \begin{align*}(14-11.83)^2+(17-11.83)^2+(16-11.83)^2+(15-11.83)^2+(6-11.83)^2+(3-11.83)^2=170.833\end{align*}
The variance of each set is:
- 2008: \begin{align*}\frac{331.5}{6}=55.25\end{align*}
- 2007: \begin{align*} \frac{231.5}{6}=38.583\end{align*}
- 2006: \begin{align*}\frac{170.833}{6}=28.472 \end{align*}
\begin{align*}\therefore\end{align*} 2008 has the greatest variance and 2006 has the least variance
Earlier Problem Revisited
Could you find the variance of a data set presented as a histogram?
After your practice above, this should no longer be a problem!
Examples
The number of cars of various colors in a parking lot with 5 levels is summarized by the table below, use the data to answer questions 1-4.
Red | Yellow | Blue | White | |
Level 1 | 11 | 4 | 9 | 14 |
Level 2 | 9 | 3 | 8 | 11 |
Level 3 | 13 | 5 | 10 | 12 |
Level 4 | 14 | 4 | 7 | 9 |
Level 5 | 12 | 6 | 13 | 7 |
Example 1
What is the variance of red cars among the 5 levels?
The population of red cars across the 5 levels is: 11, 9, 13, 14, and 12.
- Add the values and divide by five to get the mean of 11.8.
- Square each of the values and sum the squares: \begin{align*}11^2+9^2+13^2+14^2+12^2=711\end{align*}
- Divide the sum of the squares by the number of values in the set (since this is the whole population of red cars), getting \begin{align*}\frac{711}{5}=142.2\end{align*}, and subtract the mean squared \begin{align*}(11.8^2=139.24)\end{align*}
- The variance of the population of red cars is \begin{align*}142.2-139.24=2.96\end{align*}
Example 2
What is the color variance of blue cars across the 5 levels?
The levels above level 3 include only levels 4 and 5. The total number of red, yellow, blue, and white cars is 26, 10, 20, and 16, respectively.
- The mean number of cars of each color is \begin{align*}\frac{26+10+20+16}{4}=18\end{align*}
- Square the values and find the sum: \begin{align*}26^2+10^2+20^2+16^2=1432\end{align*}
- Divide the sum of the squares by the number of values: \begin{align*}\frac{1432}{4}=358\end{align*}. Subtract the squared mean \begin{align*}(18^2=324)\end{align*} to get the variance: \begin{align*}358-324=34\end{align*}
Example 3
What is the variance of blue cars across the 5 levels?
The blue car counts are: 9, 8, 10, 7, and 13
- The mean number of blue cars is \begin{align*}\frac{47}{5}=9.4\end{align*}
- The sum of the squared values is \begin{align*}9^2+8^2+10^2+7^2 +13^2=463\end{align*}, divided by the number of levels (5), gives us 92.6
- Subtract the squared mean \begin{align*}(9.4^2=88.36)\end{align*} to get the variance
- The variance is \begin{align*}92.6-88.36=4.24\end{align*}
Example 4
If we take a sample of levels by rolling a die and end up with levels 1, 3, and 5, what is the variance of white cars in the sampe?
The number of white cars on levels 1, 3, and 5 is 14, 12, and 7.
- The mean number of white cars in this sample is \begin{align*}\frac{33}{3} = 11\end{align*}
- Since this is a sample, we need to use the individual deviations: subtract the mean from each value, and square the result of each subtraction, then find the sum: \begin{align*}(14-11)^2+(12-11)^2+(7-11)^2=26\end{align*}
- Divide the sum of the deviations by the number of values minus 1 (remember, this is a sample!): \begin{align*}\frac{26}{2}=13\end{align*}
- The sample variance is 13.
Review
Find the variance:
1. 365, 400.7, 303, 479, 514.2, 500, 489
2. 7200, 7020, 7165.9, 7000, 7796, 7012, 7016.1
3. 17, 10.3, 30.7, 70, 66, 76, 40, 53
4. 3607, 3600, 3600, 3631, 3600.6
5. 700, 700, 712, 756, 741, 716, 782
6. 3370, 3300.5, 3366, 3306.6, 3310, 3336, 3301.3
Calculate the sample variance:
7. 34.4, 34, 34.7, 34.6, 34, 34.1, 31, 31.3
8. 989.22, 990.6, 992, 996.9, 981.1, 986, 975
9. 10, 16, 10.33, 10.63, 18, 17, 16.36, 10.46
10. 3240, 3260, 3250, 3280, 3280, 3300, 3310, 3270
Review (Answers)
To view the Review answers, open this PDF file and look for section 5.7.