1.4: Measures of Spread
Learning Objectives
 Calculate the range and interquartile range.
 Calculate the standard deviation for a population and a sample, and understand its meaning.
 Distinguish between the variance and the standard deviation.
 Calculate and apply Chebyshev’s Theorem to any set of data.
Introduction
In the last lesson, we studied measures of central tendency. Another important feature that can help us understand more about a data set is the manner in which the data are distributed, or spread. Variation and dispersion are words that are also commonly used to describe this feature. There are several commonly used statistical measures of spread that we will investigate in this lesson.
Range
One measure of spread is the range. The range is simply the difference between the largest value (maximum) and the smallest value (minimum) in the data.
Example: Return to the data set used in the previous lesson, which is shown below:
75, 80, 90, 94, 96
The range of this data set is
The range is useful because it requires very little calculation, and therefore, gives a quick and easy snapshot of how the data are spread. However, it is limited, because it only involves two values in the data set, and it is not resistant to outliers.
Interquartile Range
The interquartile range is the difference between the
Example: A recent study proclaimed Mobile, Alabama the wettest city in America. Source: http://www.livescience.com/environment/070518_rainy_cities.html. The following table lists measurements of the approximate annual rainfall in Mobile over a 10 year period. Find the range and
Rainfall (inches)  

1998  90 
1999  56 
2000  60 
2001  59 
2002  74 
2003  76 
2004  81 
2005  91 
2006  47 
2007  59 
Figure: Approximate Total Annual Rainfall, Mobile, Alabama. Source: http://www.cwop1353.com/CoopGaugeData.htm
First, place the data in order from smallest to largest. The range is the difference between the minimum and maximum rainfall amounts.
To find the
In this example, the range tells us that there is a difference of 44 inches of rainfall between the wettest and driest years in Mobile. The
Standard Deviation
The standard deviation is an extremely important measure of spread that is based on the mean. Recall that the mean is the numerical balancing point of the data. One way to measure how the data are spread is to look at how far away each of the values is from the mean. The difference between a data value and the mean is called the deviation. Written symbolically, it would be as follows:
Let’s take the simple data set of three randomly selected individuals’ shoe sizes shown below:
9.5, 11.5, 12
The mean of this data set is 11. The deviations are as follows:



9.5 

11.5 

12 

Notice that if a data value is less than the mean, the deviation of that value is negative. Points that are above the mean have positive deviations.
The standard deviation is a measure of the typical, or average, deviation for all of the data points from the mean. However, the very property that makes the mean so special also makes it tricky to calculate a standard deviation. Because the mean is the balancing point of the data, when you add the deviations, they always sum to 0.
Observed Data  Deviations 

9.5 

11.5 

12 

Sum of deviations 

Therefore, we need all the deviations to be positive before we add them up. One way to do this would be to make them positive by taking their absolute values. This is a technique we use for a similar measure called the mean absolute deviation. For the standard deviation, though, we square all the deviations. The square of any real number is always positive.
Observed Data 
Deviation 


9.5 


11.5  0.5 

12  1  1 
We want to find the average of the squared deviations. Usually, to find an average, you divide by the number of terms in your sum. In finding the standard deviation, however, we divide by
Example: The following are scores for two different students on two quizzes:
Student 1:
Student 2:
Note that the mean score for each of these students is 50.
Student 1: Deviations:
Squared deviations:
Variance
Standard Deviation
Student 2: Deviations:
Squared Deviations:
Variance
Standard Deviation
Student 2 has scores that are tightly clustered around the mean. In fact, the standard deviation of zero indicates that there is no variability. The student is absolutely consistent.
So, while the average of each of these students is the same (50), one of them is consistent in the work he/she does, and the other is not. This raises questions: Why did student 1 get a zero on the second quiz when he/she had a perfect paper on the first quiz? Was the student sick? Did the student forget about the quiz and not study? Or was the second quiz indicative of the work the student can do, and was the first quiz the one that was questionable? Did the student cheat on the first quiz?
There is one more question that we haven't answered regarding standard deviation, and that is, "Why
When we claim to have the standard deviation, we are making the following statement:
“The typical distance of a point from the mean is ...”
But we might be off by a little from using a sample, so it would be better to overestimate
Formulas
Sample Standard Deviation:
where:
Variance of a sample:
where:
Chebyshev’s Theorem
Pafnuty Chebyshev was a
The formal statement for Chebyshev’s Theorem is as follows:
The proportion of data points that lie within
Example: Given a group of data with mean 60 and standard deviation 15, at least what percent of the data will fall between 15 and 105?
15 is three standard deviations below the mean of 60, and 105 is 3 standard deviations above the mean of 60. Chebyshev’s Theorem tells us that at least
Example: Return to the rainfall data from Mobile. The mean yearly rainfall amount is 69.3, and the sample standard deviation is about 14.4.
Chebyshev’s Theorem tells us about the proportion of data within
So the theorem predicts that at least 75% of the data is within 2 standard deviations of the mean.
According to the drawing above, Chebyshev’s Theorem states that at least 75% of the data is between 40.5 and 98.1. This doesn’t seem too significant in this example, because all of the data falls within that range. The advantage of Chebyshev’s Theorem is that it applies to any sample or population, no matter how it is distributed.
Lesson Summary
When examining a set of data, we use descriptive statistics to provide information about how the data are spread out. The range is a measure of the difference between the smallest and largest numbers in a data set. The interquartile range is the difference between the upper and lower quartiles. A more informative measure of spread is based on the mean. We can look at how individual points vary from the mean by subtracting the mean from the data value. This is called the deviation. The standard deviation is a measure of the average deviation for the entire data set. Because the deviations always sum to zero, we find the standard deviation by adding the squared deviations. When we have the entire population, the sum of the squared deviations is divided by the population size. This value is called the variance. Taking the square root of the variance gives the standard deviation. For a population, the standard deviation is denoted by
Points to Consider
 How do you determine which measure of spread best describes a particular data set?
 What information does the standard deviation tell us about the specific, real data being observed?
 What are the effects of outliers on the various measures of spread?
 How does altering the spread of a data set affect its visual representation(s)?
Review Questions
 Use the rainfall data from figure 1 to answer this question.
 Calculate and record the sample mean:
 Complete the chart to calculate the variance and the standard deviation.
Year  Rainfall (inches)  Deviation  Squared Deviations 

1998  90  
1999  56  
2000  60  
2001  59  
2002  74  
2003  76  
2004  81  
2005  91  
2006  47  
2007  59 
Variance:
Standard Deviation:
Use the Galapagos Tortoise data below to answer questions 2 and 3.
Island or Volcano  Number of Individuals Repatriated 

Wolf  40 
Darwin  0 
Alcedo  0 
Sierra Negra  286 
Cerro Azul  357 
Santa Cruz  210 
Española  1293 
San Cristóbal  55 
Santiago  498 
Pinzón  552 
Pinta  0 
 Calculate the range and the
IQR for this data.  Calculate the sample standard deviation for this data.
 If
σ2=9 , then the population standard deviation is: 3
 8
 9
 81
 Which data set has the largest standard deviation?
 10 10 10 10 10
 0 0 10 10 10
 0 9 10 11 20
 20 20 20 20 20
On the Web
The following links discuss various issues related to measures of spread, including 1) why the population standard deviation is calculated by dividing by the entire population N, while the standard deviation of a sample is calculated by dividing by the total sample N minus 1; and 2) Why the standard deviation is calculated by a rather complex process of summing the squares of the differences between the data points and the mean, averaging these differences, and then taking the square root of the average, rather than simply averaging the non squared absolute differences.
http://mathcentral.uregina.ca/QQ/database/QQ.09.99/freeman2.html
http://mathforum.org/library/drmath/view/52722.html
http://edhelper.com/statistics.htm
http://www.newton.dep.anl.gov/newton/askasci/1993/math/MATH014.HTM
Technology Notes: Calculating Standard Deviation on the TI83/84 Graphing Calculator
Enter the data 9.5, 11.5, 12 in list L1 (see first screen below).
Then choose '1Var Stats' from the CALC submenu of the STAT menu (second screen).
Enter L1 (third screen) and press [ENTER] to see the fourth screen.
In the fourth screen, the symbol
Image Attributions
Description
Tags:
Subjects:
Date Created:
Feb 23, 2012Last Modified:
Aug 11, 2015If you would like to associate files with this section, please make a copy first.