5.2: Calculating the Standard Deviation
Learning Objectives
- Understand the meaning of standard deviation.
- Understanding the percents associated with standard deviation.
- Calculate the standard deviation for a normally distributed random variable.
Introduction
You have recently received your mark from a recent Math test that you had written. Your mark is 71 and you are curious to find out how your grade compares to that of the rest of the class. Your teacher has decided to let you figure this out for yourself. She tells you that the marks were normally distributed and provides you with a list of the marks. These marks are in no particular order – they are random.
\begin{align*}& 32 \quad 88 \quad 44 \quad 40 \quad 92 \quad 72 \quad 36 \quad 48 \quad 76\\ & 92 \quad 44 \quad 48 \quad 96 \quad 80 \quad 72 \quad 36 \quad 64 \quad 64\\ & 60 \quad 56 \quad 48 \quad 52 \quad 56 \quad 60 \quad 64 \quad 68 \quad 68\\ & 64 \quad 60 \quad 56 \quad 52 \quad 56 \quad 60 \quad 60 \quad 64 \quad 68\end{align*}
We will discover how your grade compares to the others in your class later in the lesson.
Standard Deviation
In the previous lesson you learned that standard deviation was the spread of the data away from the mean of a set of data. You also learned that 68% of the data lies within the two inflection points. In other words, 68% of the data is within one step to the right and one step to the left of the mean of the data. What does it mean if your mark is not within one step? Let’s investigate this further. Below is a picture that represents the mean of the data and six steps – three to the left and three to the right.
These rectangles represent tiles on a floor and you are standing on the middle tile – the blue one. You are then asked to move off your tile and onto the next tile. You could move to the green tile on the left or to the green tile on the right. Whichever way you move, you have to take one step. The same would occur if you were asked to move to the second tile. You would have to take two steps to the right or two steps to the left to stand on the red tile. Finally, to stand on the purple tile would require you to take three steps to the right or three steps to the left.
If this process is applied to standard deviation, then one step to the right or one step to the left is considered one standard deviation away from the mean. Two steps to the left or two steps to the right are considered two standard deviations away from the mean. Likewise, three steps to the left or three steps to the right are considered three standard deviations from the mean. There is a value for the standard deviation that tells you how big your steps must be to move from one tile to the other. This value can be calculated for a given set of data and it is added three times to the mean for moving to the right and subtracted three times from the mean for moving to the left. If the mean of the tiles was 65 and the standard deviation was 4, then you could put numbers on all the tiles.
For normal distribution, 68% of the data would be located between 61 and 69. This is within one standard deviation of the mean. Within two standard deviations of the mean, 95% of the data would be located between 57 and 73. Finally, within three standard deviations of the mean, 99.7% of the data would be located between 53 and 77. Now let’s see what this entire explanation means on a normal distribution curve.
Now it is time to actually calculate the standard deviation of a set of numbers. To make the process more organized, it is best to use a table to record your work. The table will consist of three columns. The first column will contain the data and will be labeled \begin{align*}x\end{align*}. The second column will contain the differences between the data value of the mean of the data. This column will be labelled \begin{align*}(x-\bar{x})\end{align*}. The final column will contain the square of each of the values in the second column. \begin{align*}(x-\bar{x})^2\end{align*}.
To find the standard deviation you subtract the mean from each data score to determine how much the data varies from the mean. This will result in positive values when the data point is greater than the mean and in negative values when the data point is less than the mean.
If we continue now, what would happen is that when we sum the variations (Data – Mean \begin{align*}(x-\bar{x})\end{align*} column both negative and positive variations would give a total of zero. The sum of zero implies that there is no variation in the data and the mean. In other words, if we were conducting a survey of the number of hours that students watch television in one day, and we relied upon the sum of the variations to give us some pertinent information, the only thing that we would learn is that all students watch television for the exact same number of hours each day. We know that this is not true because we did not receive the same answer from every student. In order to ensure that these variations will not lose their significance when added, the variation values are squared prior to adding them together.
What we need for this normal distribution is a measure of spread that is proportional to the scatter of the data, independent of the number of values in the data set and independent of the mean. The spread will be small when the data values are close but large when the data values are scattered. Increasing the number of values in a data set will increase the values of both the variance and the standard deviation even if the spread of the values is not increasing. These values should be independent of the mean because we are not interested in this measure of central tendency but rather with the spread of the data. For a normal distribution, both the variance and the standard deviation fit the above profile and both values can be calculated for the set of data.
To calculate the variance \begin{align*}(\sigma^2)\end{align*} for a set of normally distributed data:
- To determine the measure of each value from the mean, subtract the mean of the data from each value in the data set. \begin{align*}(x-\bar{x})\end{align*}
- Square each of these differences and add the positive, squared results.
- Divide this sum by the number of values in the data set.
These steps for calculating the variance of a data set can be summarized in the following formula:
\begin{align*}\sigma^2=\frac{\sum(x-\bar{x})^2}{n}\end{align*}
where:
\begin{align*}x\end{align*} represents the data value; \begin{align*}\bar{x}\end{align*} represents the mean of the data set; \begin{align*}n\end{align*} represents the number of data values. Remember that the symbol \begin{align*}\sum\end{align*} stands for summation.
Example 1:
Given the following weights (in pounds) of children attending a day camp, calculate the variance of the weights.
\begin{align*}52, 57, 66, 61, 69, 58, 81, 69, 74\end{align*}
\begin{align*}x\end{align*} | \begin{align*}(x-\bar{x})\end{align*} | \begin{align*}(x-\bar{x})^2\end{align*} |
---|---|---|
52 | -13.2 | 174.24 |
57 | -8.2 | 67.24 |
66 | 0.8 | 0.64 |
61 | -4.2 | 17.64 |
69 | 3.8 | 14.44 |
58 | -7.2 | 51.84 |
81 | 15.8 | 249.64 |
69 | 3.8 | 14.44 |
74 | 8.8 | 77.44 |
\begin{align*}\bar{x} &= \frac{\sum(x)}{n} && \sigma^2 = \frac{\sum(x-\bar{x})^2}{n}\\ \bar{x} &= \frac{587}{9} && \sigma^2 = \frac{667.56}{9}\\ \bar{x} &= 65.2 && \sigma^2 = 74.17\end{align*}
Remember that the variance is the mean of the squares of the differences between the data value and the mean of the data. The resulting value will take on the units of the data. This means that for the variance of the data above, the units would be square pounds.
The standard deviation is simply the square root of the variance for the data set. When the standard deviation is calculated for the above data, the resulting value will be in pounds. This table could be extended to include a frequency column for values that are repeated adding three additional columns to the table. This often leads to errors in calculations. Since simple is often best, values that are repeated can just be written in the table as many times as they appear in the data.
Example 2:
Calculate the variance and the standard deviation of the following values:
Solution:
\begin{align*}5, 14, 16, 17, 18\end{align*}
\begin{align*}x\end{align*} | \begin{align*}(x-\bar{x})\end{align*} | \begin{align*}(x-\bar{x})^2\end{align*} |
---|---|---|
5 | -9 | 81 |
14 | 0 | 0 |
16 | 2 | 4 |
17 | 3 | 9 |
18 | 4 | 16 |
Work space for completing the table
\begin{align*}\sum x& =70 && (x-\bar{x}) \rightarrow 5-14=-9; \ 14-14=0; \ 16-14=2; \ 17-14=3; \ 18-14=4\\ \bar{x}& =\frac{70}{5} && (x-\bar{x})^2 \rightarrow (-9)^2=81; \ (0)^2=0; \ (2)^2=4 \ (3)^2=9; \ (4)^2=16\\ \bar{x}& =14\end{align*}
Variance: \begin{align*}\sum(x-\bar{x})^2=100\end{align*}
\begin{align*}\sigma^2&=\frac{\sum(x-\bar{x})^2}{n}\\ \sigma^2&=\frac{110}{5}\\ \sigma^2&=22\end{align*}
Standard Deviation: \begin{align*}\sum(x-\bar{x})^2=110\end{align*}
\begin{align*}\bar{x}&=\frac{110}{5}\\ \bar{x}&=22\\ SD&=\sqrt{22}\\ SD&=4.7\end{align*}
The symbol \begin{align*}(\sigma)\end{align*} is used to represent standard deviation. Using this symbol and the steps that were followed to calculate the standard deviation, we can write the following formula:
\begin{align*}\sigma=\sqrt{\frac{\sum(x-\bar{x})^2}{n}}\end{align*}
HINT: If you are wondering if your calculations are correct, a quick way to check is to add the values in the \begin{align*}(x-\bar{x})\end{align*} column. The total is always zero.
Example 3:
Calculate the standard deviation of the following numbers:
\begin{align*}1, 5, 3, 5, 4, 2, 1, 1, 6, 2\end{align*}
Solution:
\begin{align*}x\end{align*} | \begin{align*}(x-\bar{x})\end{align*} | \begin{align*}(x-\bar{x})^2\end{align*} |
---|---|---|
1 | -2 | 4 |
5 | 2 | 4 |
3 | 0 | 0 |
5 | 2 | 4 |
4 | 1 | 1 |
2 | -1 | 1 |
1 | -2 | 4 |
1 | -2 | 4 |
6 | 3 | 9 |
2 | -1 | 1 |
\begin{align*}\sum x & =30 && \sigma=\sqrt{\frac{\sum(x-\bar{x})^2}{n}}\\ \bar{x} & =\frac{30}{10} && \sigma=\sqrt{\frac{32}{10}}\\ \bar{x}& =3 && \sigma=\sqrt{3.2}\\ &&& \sigma =1.8\end{align*}
Now that you know how to calculate the variance and the standard deviation of a set of data, let’s apply this to normal distribution, by determining how your Math mark compared to the marks achieved by your classmates. This time technology will be used to determine both the variance and the standard deviation of the data.
Solution:
From the list, you can see that the mean of the marks is 61 and the standard deviation is 15.6.
To use technology to calculate the variance involves naming the lists according to the operations that you need to do to determine the correct values. As well, you can use the \begin{align*}2^{nd}\end{align*} catalogue function of the calculator to determine the sum of the squared variations. All of the same steps used to calculate the standard deviation of the data are applied to give the mean of the data set. You could use the \begin{align*}2^{nd}\end{align*} catalogue function to find the mean of the data, but since you are now familiar with 1-Var Stats, you may as well use this method.
The mean of the data is 61. \begin{align*}L_2\end{align*} will now be renamed \begin{align*}L_1\end{align*}-61 to compute the values for \begin{align*}(x-\bar{x})\end{align*}.
Likewise, \begin{align*}L_3\end{align*} will be renamed \begin{align*}(L_2)^2\end{align*}.
The sum of the third list divided by the number of data (36) is the variance of the marks.
Lesson Summary
In this lesson you learned that the standard deviation of a set of data was a value that represented the spread of the data from the mean of the data. You also learned that the variance of the data from the mean is the squared value of these differences since the sum of the differences was zero. Calculating the standard deviation manually and by using technology was an additional topic you learned in this lesson.
Points to Consider
- Does the value of standard deviation stand alone or can it be displayed with a normal distribution?
- Are there defined increments for how the data spreads away from the mean?
- Can the standard deviation of a set of data be applied to real world problems?