6.2: Calculating the Standard Deviation
Learning Objectives
- Understand the meaning of standard deviation.
- Understanding the percents associated with standard deviation.
- Calculate the standard deviation for a normally distributed random variable.
This semester you decided to join the school’s bowling league for the first time. Having never bowled previously, you are very anxious to find out what your bowling average is for the semester. If your average is comparable to that of the other students, you will join the league again next semester. Your coach has told you that your mean score is 70, and now you want to find out how your results compare to those of the other members of the league.
Your coach has decided to let you figure this out for yourself. He tells you that the scores were normally distributed and provides you with a list of the other mean scores. These average scores are in no particular order. In other words, they are random.
\begin{align*}& 54 \quad 88 \quad 49 \quad 44 \quad \ 96 \quad 72 \quad 46 \quad 58 \quad 79\\ & 92 \quad 44 \quad 50 \quad 102 \quad 80 \quad 72 \quad 66 \quad 64 \quad 61\\ & 60 \quad 56 \quad 48 \quad 52 \quad \ 54 \quad 60 \quad 64 \quad 72 \quad 68\\ & 64 \quad 60 \quad 56 \quad 52 \quad \ 55 \quad 60 \quad 62 \quad 64 \quad 68\end{align*}
We will discover how your mean bowling score compares to that of the other bowlers later in the lesson.
Standard Deviation
In the previous lesson, you learned that standard deviation is a measure of the spread of a set of data away from the mean of the data.
In a normal distribution, on either side of the line of symmetry, the curve appears to change its shape from being concave down (looking like an upside-down bowl) to being concave up (looking like a right-side-up bowl). Where this happens is called an inflection point of the curve. If a vertical line is drawn from an inflection point to the \begin{align*}x\end{align*}-axis, the difference between where the line of symmetry goes through the \begin{align*}x\end{align*}-axis and where this line goes through the \begin{align*}x\end{align*}-axis represents 1 standard deviation away from the mean. Approximately 68% of all the data is located within 1 standard deviation of the mean.
To emphasize this fact and the fact that the mean is the middle of the distribution, let’s play a game of Simon Says. Using color paper and 2 types of shapes, arrange the pattern of the shapes on the floor as shown below. Randomly select 7 students from your class to play the game. You will be Simon, and you are to give orders to the selected students. Only when Simon Says are the students to obey the given order. The orders can be given in many ways, but 1 suggestion is to deliver the following orders:
- “Simon Says for Frank to stand on the rectangle.”
- “Simon Says for Joey to stand on the closest oval to the right of Frank.”
- “Simon Says for Liam to stand on the closest oval to the left of Frank.”
- “Simon Says for Mark to stand on the farthest oval to the right of Frank.”
- “Simon Says for Juan to stand on the farthest oval to the left of Frank.”
- “Simon Says for Jacob to stand on the middle oval to the right of Frank.”
- “Simon Says for Sean to stand on the middle oval to the left of Frank.”
Once the students are standing in the correct places, pose questions about their positions with respect to Frank. The members of the class who are not playing the game should be asked to respond to these questions about the position of their classmates. Some questions that should be asked are the following:
- “Which 2 students are standing closest to Frank?”
- “Are Joey and Liam both the same distance away from Frank?”
- “Which 2 students are furthest away from Frank?”
- “Are Mark and Juan both the same distance away from Frank?”
When the students have completed playing Simon Says, they should have an understanding of the concept that the mean is the middle of the distribution and the remainder of the distribution is evenly spread out on either side of the mean.
The picture below is a simplified form of the game you have just played. The yellow rectangle is the mean, and the remaining rectangles represent 3 steps to the right of the mean and 3 steps to the left of the mean.
If we consider the spread of the data away from the mean, which is measured using standard deviation, as being a stepping process, then 1 step to the right or 1 step to the left is considered 1 standard deviation away from the mean. 2 steps to the left or 2 steps to the right are considered 2 standard deviations away from the mean. Likewise, 3 steps to the left or 3 steps to the right are considered 3 standard deviations away from the mean. The standard deviation of a data set is simply a value, and in relation to the stepping process, this value would represent the size of your footstep as you move away from the mean. Once the value of the standard deviation has been calculated, it is added to the mean for moving to the right and subtracted from the mean for moving to the left. If the value of the yellow mean tile was 58, and the value of the standard deviation was 5, then you could put the resulting sums and differences on the appropriate tiles.
For a normal distribution, 68% of the data values would be located within 1 standard deviation of the mean, which is between 53 and 63. Also, 95% of the data values would be located within 2 standard deviations of the mean, which is between 48 and 68. Finally, 99.7% of the data values would be located within 3 standard deviations of the mean, which is between 43 and 73. The percentages mentioned here make up what statisticians refer to as the 68-95-99.7 Rule. These percentages remain the same for all data that can be assumed to be normally distributed. The following diagram represents the location of these values on a normal distribution curve.
Now that you understand the distribution of the data and exactly how it moves away from the mean, you are ready to calculate the standard deviation of a data set. For the calculation steps to be organized, a table is used to record the results for each step. The table will consist of 3 columns. The first column will contain the data and will be labeled \begin{align*}x\end{align*}. The second column will contain the differences between the data values and the mean of the data set. This column will be labeled \begin{align*}(x-\overline{x})\end{align*} for a sample and \begin{align*}(x-\mu)\end{align*} for a population. The final column will be labeled \begin{align*}(x-\overline{x})^2\end{align*} for a sample and \begin{align*}(x-\mu)^2\end{align*} for a population, and it will contain the square of each of the values recorded in the second column.
Example 2
Calculate the standard deviation of the following numbers, which represent a small population:
\begin{align*}2, 7, 5, 6, 4, 2, 6, 3, 6, 9\end{align*}
Solution:
Step 1: It is not necessary to organize the data. Create a table and label each of the columns appropriately. Write the data values in column \begin{align*}x\end{align*}.
Step 2: Calculate the mean of the data values.
\begin{align*}\mu=\frac{2+7+5+6+4+2+6+3+6+9}{10} = \frac{50}{10}=5.0\end{align*}
Step 3: Calculate the differences between the data values and the mean. Enter the results in the second column.
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} |
---|---|
2 | \begin{align*}-3\end{align*} |
7 | 2 |
5 | 0 |
6 | 1 |
4 | \begin{align*}-1\end{align*} |
2 | \begin{align*}-3\end{align*} |
6 | 1 |
3 | \begin{align*}-2\end{align*} |
6 | 1 |
9 | 4 |
Step 4: Calculate the values for column 3 by squaring each result in the second column.
\begin{align*}(x-\mu)^2\end{align*} |
---|
9 |
4 |
0 |
1 |
1 |
9 |
1 |
4 |
1 |
16 |
Step 5: Calculate the mean of the third column and then take the square root of the answer. This value is the standard deviation \begin{align*}(\sigma)\end{align*} of the data set.
\begin{align*}\sigma^2 & = \frac{9+4+0+1+1+9+1+4+1+16}{10}=\frac{46}{10}=4.6\\ \sigma & = \sqrt{4.6} \approx 2.1\end{align*}
Step 5 can be written using the formula \begin{align*}\sigma = \sqrt{\frac{\sum{(x-\mu})^2}{n}}\end{align*}.
The standard deviation of the data set is approximately 2.1.
Now that you have completed all the steps, here is the table that was used to record the results. The table was separated as the steps were completed. Now that you know the process involved in calculating the standard deviation, there is no need to work with individual columns\begin{align*}-\end{align*}work with an entire table.
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
2 | \begin{align*}-3\end{align*} | 9 |
7 | 2 | 4 |
5 | 0 | 0 |
6 | 1 | 1 |
4 | \begin{align*}-1\end{align*} | 1 |
2 | \begin{align*}-3\end{align*} | 9 |
6 | 1 | 1 |
3 | \begin{align*}-2\end{align*} | 4 |
6 | 1 | 1 |
9 | 4 | 16 |
Example 3
A company wants to test its exterior house paint to determine how long it will retain its original color before fading. The company mixes 2 brands of paint by adding different chemicals to each brand. 6 one-gallon cans are made for each paint brand, and the results are recorded for every gallon of each brand of paint. The following are the results obtained in the laboratory:
Brand A (Time in months) | Brand B (Time in months) |
---|---|
15 | 40 |
65 | 50 |
55 | 35 |
35 | 40 |
45 | 45 |
25 | 30 |
Calculate the standard deviation for each brand of paint. These are both small populations.
Solution:
Brand A:
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
15 | \begin{align*}-25\end{align*} | 625 |
65 | 25 | 625 |
55 | 15 | 225 |
35 | \begin{align*}-5\end{align*} | 25 |
45 | 5 | 25 |
25 | \begin{align*}-15\end{align*} | 225 |
\begin{align*}\mu &=\frac{15+65+55+35+45+25}{6}=\frac{240}{6}=40\\ \sigma & = \sqrt{\frac{\sum (x-\mu)^2}{n}}\\ \sigma & = \sqrt{\frac{625+625+225+25+25+225}{6}}\\ \sigma & = \sqrt{\frac{1,750}{6}} \approx \sqrt{291.66} \approx 17.1\end{align*}
The standard deviation for Brand A is approximately 17.1.
Brand B:
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
40 | 0 | 0 |
50 | 10 | 100 |
35 | \begin{align*}-5\end{align*} | 25 |
40 | 0 | 0 |
45 | 5 | 25 |
30 | \begin{align*}-10\end{align*} | 100 |
\begin{align*}\mu&=\frac{40+50+35+40+45+30}{6}=\frac{240}{6}=40\\ \sigma & = \sqrt{\frac{\sum (x-\mu)^2}{n}}\\ \sigma & = \sqrt{\frac{0+100+25+0+25+100}{6}}\\ \sigma & = \sqrt{\frac{250}{6}} \approx \sqrt{41.66} \approx 6.5\end{align*}
The standard deviation for Brand B is approximately 6.5.
Note: The standard deviation for Brand A (17.1) was much larger than that for Brand B (6.5). However, the means of both brands were the same. When the means are equal, the larger the standard deviation is, the more variable are the data.
To find the standard deviation, you subtract the mean from each data value to determine how much each data value varies from the mean. The result is a positive value when the data value is greater than the mean, a negative value when the data value is less than the mean, and 0 when the data value is equal to the mean.
If we were to add the variations found in the second column of the table, the total would be 0. This result of 0 implies that there is no variation between the data value and the mean. In other words, if we were conducting a survey of the number of hours that students use a cell phone in 1 day, and we relied upon the sum of the variations to give us some pertinent information, the only thing that we would learn is that all the students who participated in the survey use a cell phone for the exact same number of hours each day. We know that this is not true, because the survey does not show all the responses as being the same. In order to ensure that these variations do not lose their significance when added, the variation values are squared prior to calculating their sum.
What we need for a normal distribution is a measure of spread that is proportional to the scatter of the data, independent of the number of values in the data set and independent of the mean. The spread will be small when the data values are consistent, but large when the data values are inconsistent. The reason that the measure of spread should be independent of the mean is because we are not interested in this measure of central tendency, but rather, only in the spread of the data. For a normal distribution, both the variance and the standard deviation fit the above profile for an appropriate measure of spread, and both values can be calculated for the set of data.
To calculate the variance \begin{align*}(\sigma^2)\end{align*} for a population of normally distributed data:
Step 1: Determine the mean of the data values.
Step 2: Subtract the mean of the data from each value in the data set to determine the difference between the data value and the mean: \begin{align*}(x-\mu)\end{align*}.
Step 3: Square each of these differences and determine the total of these positive, squared results.
Step 4: Divide this sum by the number of values in the data set.
These steps for calculating the variance of a data set for a population can be summarized in the following formula:
\begin{align*}\sigma^2 = \frac{\sum(x-\mu)^2}{n}\end{align*}
where:
\begin{align*}x\end{align*} is a data value.
\begin{align*}\mu\end{align*} is the population mean.
\begin{align*}n\end{align*} is number of data values (population size).
These steps for calculating the variance of a data set for a sample can be summarized in the following formula:
\begin{align*}s^2 = \frac{\sum(x-\overline{x})^2}{n-1}\end{align*}
where:
\begin{align*}x\end{align*} is a data value.
\begin{align*}\overline{x}\end{align*} is the sample mean.
\begin{align*}n\end{align*} is number of data values (sample size).
The only difference in the formulas is the number by which the sum is divided. For a population, it is divided by \begin{align*}n\end{align*}, and for a sample, it is divided by \begin{align*}n - 1\end{align*}.
Example 4
Calculate the variance of the 2 brands of paint in Example 3. These are both small populations.
Brand A (Time in months) | Brand B (Time in months) |
---|---|
15 | 40 |
65 | 50 |
55 | 35 |
35 | 40 |
45 | 45 |
25 | 30 |
Solution:
Brand A
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
15 | \begin{align*}-25\end{align*} | 625 |
65 | 25 | 625 |
55 | 15 | 225 |
35 | \begin{align*}-5\end{align*} | 25 |
45 | 5 | 25 |
25 | \begin{align*}-15\end{align*} | 225 |
\begin{align*}\mu&=\frac{15+65+55+35+45+25}{6}=\frac{240}{6}=40\\ \sigma^2 & = \frac{\sum (x- \mu)^2}{n}\\ \sigma^2 & = \frac{625+625+225+25+25+225}{6} = \frac{1750}{6} \approx 291.\overline{66}\end{align*}
Brand B
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
40 | 0 | 0 |
50 | 10 | 100 |
35 | \begin{align*}-5\end{align*} | 25 |
40 | 0 | 0 |
45 | 5 | 25 |
30 | \begin{align*}-10\end{align*} | 100 |
\begin{align*}\mu&=\frac{40+50+35+40+45+30}{6}=\frac{240}{6}=40\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n}\\ \sigma^2 & = \frac{0+100+25+0+25+100}{6} = \frac{250}{6} \approx 41.\overline{66}\end{align*}
From the calculations done in Example 3 and in Example 4, you should have noticed that the square root of the variance is the standard deviation, and the square of the standard deviation is the variance. Taking the square root of the variance will put the standard deviation in the same units as the given data. The variance is simply the average of the squares of the distance of each data value from the mean. If these data values are close to the value of the mean, the variance will be small. This was the case for Brand B. If these data values are far from the mean, the variance will be large, as was the case for Brand A.
The variance and the standard deviation of a data set are always positive values.
Example 5
The following data represents the morning temperatures \begin{align*}(^\circ \text{C})\end{align*} and the monthly rainfall (mm) in July for all the Canadian cities east of Toronto:
Temperature \begin{align*}(^\circ \text{C})\end{align*}
\begin{align*}& 11.7 \quad 13.7 \quad 10.5 \quad \ 14.2 \quad 13.9 \quad 14.2 \quad 10.4 \quad 16.1 \quad 16.4\\ & 4.8 \quad \ \ 15.2 \quad 13.0 \quad \ 14.4 \quad 12.7 \quad 8.6 \quad \ 12.9 \quad 11.5 \quad 14.6\end{align*}
Precipitation (mm)
\begin{align*}& 18.6 \quad 37.1 \quad 70.9 \quad \ 102 \quad \ 59.9 \quad 58.0 \quad 73.0 \quad 77.6 \quad \ 89.1\\ & 86.6 \quad 40.3 \quad 119.5 \quad 36.2 \quad 85.5 \quad 59.2 \quad 97.8 \quad 122.2 \quad 82.6\end{align*}
Which data set is more variable? Calculate the standard deviation for each data set. Both are small populations.
Solution:
Temperature \begin{align*}(^\circ \text{C})\end{align*}
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
11.7 | \begin{align*}-1\end{align*} | 1 |
13.7 | 1 | 1 |
10.5 | \begin{align*}-2.2\end{align*} | 4.84 |
14.2 | 1.5 | 2.25 |
13.9 | 1.2 | 1.44 |
14.2 | 1.5 | 2.25 |
10.4 | \begin{align*}-2.3\end{align*} | 5.29 |
16.1 | 3.4 | 11.56 |
16.4 | 3.7 | 13.69 |
4.8 | \begin{align*}-7.9\end{align*} | 62.41 |
15.2 | 2.5 | 6.25 |
13.0 | 0.3 | 0.09 |
14.4 | 1.7 | 2.89 |
12.7 | 0 | 0 |
8.6 | \begin{align*}-4.1\end{align*} | 16.81 |
12.9 | 0.2 | 0.04 |
11.5 | \begin{align*}-1.2\end{align*} | 1.44 |
14.6 | 1.9 | 3.61 |
\begin{align*}\mu&= \frac{\sum x}{n} = \frac{228.6}{18} \approx 12.7\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n} && \sigma = \sqrt{\frac{\sum (x-\mu)^2}{n}}\\ \sigma^2 & = \frac{136.86}{18} \approx 7.6 && \sigma = \sqrt{\frac{136.86}{18}} \approx 2.8\end{align*}
The variance of the data set is approximately \begin{align*}7.6 \ ^\circ \text{C}\end{align*}, and the standard deviation of the data set is approximately \begin{align*}2.8 \ ^\circ \text{C}\end{align*}.
Precipitation (mm)
\begin{align*}x\end{align*} | \begin{align*}(x-\mu)\end{align*} | \begin{align*}(x-\mu)^2\end{align*} |
---|---|---|
18.6 | \begin{align*}-54.5\end{align*} | 2970.3 |
37.1 | \begin{align*}-36.0\end{align*} | 1296 |
70.9 | \begin{align*}-2.2\end{align*} | 4.84 |
102.0 | 28.9 | 835.21 |
59.9 | \begin{align*}-13.2\end{align*} | 174.24 |
58.0 | \begin{align*}-15.1\end{align*} | 228.01 |
73.0 | \begin{align*}-0.1\end{align*} | 0.01 |
77.6 | 4.5 | 20.25 |
89.1 | 16.0 | 256 |
86.6 | 13.5 | 182.25 |
40.3 | \begin{align*}-32.8\end{align*} | 1075.8 |
119.5 | 46.4 | 2153 |
36.2 | \begin{align*}-36.9\end{align*} | 1361.6 |
85.5 | 12.4 | 153.76 |
59.2 | \begin{align*}-13.9\end{align*} | 193.21 |
97.8 | 24.7 | 610.09 |
122.2 | 49.1 | 2410.8 |
82.6 | 9.5 | 90.25 |
\begin{align*}\mu&= \frac{\sum x}{n} = \frac{1,316.1}{18} \approx 73.1\\ \sigma^2 & = \frac{\sum (x-\mu)^2}{n} && \sigma = \sqrt{\frac{\sum (x-\overline{x})^2}{n}}\\ \sigma^2 & = \frac{14,016}{18} \approx 778.\overline{66} && \sigma = \sqrt{\frac{14,016}{18}} \approx 27.9\end{align*}
The variance of the data set is approximately 778.66 mm, and the standard deviation of the data set is approximately 27.9 mm.
Therefore, the data values for the precipitation are more variable. This is indicated by the large variance of the data set.
Example 6
Now that you know how to calculate the variance and the standard deviation of a set of data, let’s apply this to a normal distribution by determining how your bowling average compared to those of the other bowlers in your league. This time technology will be used to determine both the variance and the standard deviation of the data, which represents a small population.
\begin{align*}& 54 \quad 88 \quad 49 \quad 44 \quad \ 96 \quad 72 \quad 46 \quad 58 \quad 79\\ & 92 \quad 44 \quad 50 \quad 102 \ \ \ 80 \quad 72 \quad 66 \quad 64 \quad 61\\ & 60 \quad 56 \quad 48 \quad 52 \quad \ 54 \quad 60 \quad 64 \quad 72 \quad 68\\ & 64 \quad 60 \quad 56 \quad 52 \quad \ 55 \quad 60 \quad 62 \quad 64 \quad 68\end{align*}
Solution:
From the list, you can see that the mean of the bowling averages is approximately 63.7 and that the standard deviation is approximately 14.1.
To use technology to calculate the variance involves naming the lists according to the operations that you need to do in order to determine the correct values. In addition, you can use the CATALOG menu of the calculator to determine the sum of the squared variations. You could also use the CATALOG menu to find the mean of the data, but since you are now familiar with 1-Var Stats, you can use this method.
The mean of the data is approximately 63.7. L2 will now be renamed \begin{align*}\text{L1} - 63.7\end{align*} to compute the values for \begin{align*}(x-\overline{x})\end{align*}.
Likewise, L3 will be renamed \begin{align*}\text{L2}^2\end{align*}.
The sum of the values in L3 divided by the number of data values (36) is the variance of the bowling averages. You could have also just squared the standard deviation of the bowling averages that you found earlier.
Lesson Summary
In this lesson, you learned that the standard deviation of a set of data is a value that represents a measure of the spread of the data from the mean. You also learned that the variance of the data from the mean is the square of the standard deviation. Calculating the standard deviation manually and calculating it by using technology were additional topics you learned in this lesson.
Points to Consider
- Does the value of standard deviation stand alone, or can it be displayed with a normal distribution?
- Are there defined increments for how data spreads away from the mean?
- Can the standard deviation of a set of data be applied to real-world problems?