11.6: Measures of Central Tendency and Dispersion
Learning Objectives
 Compare measures of central tendency.
 Measure the dispersion of a collection of data.
 Calculate and interpret measures of central tendency and dispersion for realworld situations.
Comparing Measures of Central Tendency
The word “average” is often used to describe something that is used to represent the general characteristics of a larger group of unequal objects. Mathematically, an average is a single number which can be used to summarize a collection of numerical values. In mathematics, there are several types of “averages” with the most common being the mean, the median and the mode.
Mean
The arithmetic mean of a group of numbers is found by dividing the sum of the numbers by the number of values in the group. In other words, we add all the numbers together and divide by the number of numbers.
Example 1
Find the mean of the numbers 11, 16, 9, 15, 5, 18
Solution
There are six separate numbers, so we find the mean with the following.
\begin{align*}\text{mean}=\frac{11+ 16 + 9 + 15 + 5 + 18}{6}=\frac{74}{6}=12 \frac{1}{3}.\end{align*}
The arithmetic mean is what most people automatically think of when the word average is used with numbers. It is generally a good way to take an average, but suffers when a small number of the values lie significantly far away from the majority of the rest. A classic example would be when calculating average income. If one person (such as Former Microsoft Corporation chairman Bill Gates) earns a great deal more than everyone else who is surveyed, then one value can sway the mean significantly away from what the majority of people earn.
Example 2
The annual incomes for 8 professions are shown below. Form the data, calculate the mean annual income of the 8 professions.
Professional Realm  Annual income 

Farming, Fishing, and Forestry  $19, 630 
Sales and Related  $28, 920 
Architecture and Engineering  $56, 330 
Healthcare Practitioners  $49, 930 
Legal  $69, 030 
Teaching & Education  $39, 130 
Construction  $35, 460 
Professional Baseball Player*  $2, 476, 590 
(Source: Bureau of Labor Statistics, except (*)  The Baseball Players' Association (playbpa.com)).
Solution
There are 8 values listed so we find the mean as follows.
\begin{align*}\text{mean }&=\frac{\$(19630 + 28920 + 56330 + 49930 + 69030 + 39130 + 35460 + 2476590)}{8}\\ &=\$346,877.50\end{align*}
As you can see, the mean annual income is substantially larger than 7 out of the 8 professions. The effect of the single outlier (the baseball player) has a dramatic effect on the mean, so the mean is not a good method for representing the ‘average’ salary in this case.
Algebraic Formula for the Mean.
If we have a number of values such as 11, 16, 9, 15, 5, 18 we may label them as follows.
Position in Sequence  Label  Value 

\begin{align*}1^{st}\end{align*}  \begin{align*}x_1\end{align*}  11 
\begin{align*}2^{nd}\end{align*}  \begin{align*}x_2\end{align*}  16 
\begin{align*}3^{rd}\end{align*}  \begin{align*}x_3\end{align*}  9 
\begin{align*}4^{th}\end{align*}  \begin{align*}x_4\end{align*}  15 
\begin{align*}5^{th}\end{align*}  \begin{align*}x_5\end{align*}  5 
\begin{align*}6^{th}\end{align*}  \begin{align*}x_6\end{align*}  18 
We can see from the table that \begin{align*}x_1=11, x_2=16, x_3=9,\end{align*} etc... If we also say that the number of terms \begin{align*}= n\end{align*}, then just as \begin{align*}x_1\end{align*} is the first term, \begin{align*}x_n\end{align*} is the last term. We can now define the mean (given the symbol \begin{align*} \bar{x}\end{align*}) as
Arithmetic mean
\begin{align*} \bar{x}=\frac{x_1 + x_2 + x_3 + \ldots + x_n}{n}\end{align*}
Median
The median is another type of average. It is defined as the value in the middle of a group of numbers. To find the median, we must first list all numbers in order from least to greatest.
Example 3
Find the median of the numbers 11, 21, 6, 17, 9.
Solution:
We first list the numbers in ascending order.
6, 9, 11, 17, 21
The median is the value in the middle of the set (in bold).
The median is 11. There are two values higher than 11 and two values lower than 11.
If there is an even number of values then the median is taken as the arithmetic mean of the two numbers in the middle.
Example 4
Find the median of the numbers 2, 17, 1, 3, 12, 8, 12, 16
Solution:
We first list the numbers in ascending order.
3, 1, 2, 8, 12, 12, 16, 17
The median is the value in the middle of the set, and lies between 8 and 12:
\begin{align*}\text{median}=\frac{8 + 12}{2}=\frac{20}{2}= 10.\end{align*}
The median is 10. Four values are lower than 10, four values are higher than 10.
If you look again at the two previous examples, you will see that when we had 5 values, the median was the \begin{align*}3^{rd}\end{align*} term. With 8 values, the median was half way between the \begin{align*}4^{th}\end{align*} and \begin{align*}5^{th}\end{align*} values. In general, with a total of \begin{align*}n\end{align*} values, the median is the \begin{align*} \left (\frac{n + 1}{2}\right)^{th}\end{align*} value. When the quantity \begin{align*} \left (\frac{n + 1}{2}\right)\end{align*} is fractional, it indicates that the median is the mean of two data points. For example with 15 ordered data points, the median would be the \begin{align*} \left (\frac{15 + 1}{2}\right)=8^{th}\end{align*} value. For 50 data points the quantity \begin{align*} \left (\frac{n + 1}{2}\right)= 25.5\end{align*} indicating that the median is given by taking the arithmetic mean of the \begin{align*}25^{th}\end{align*} and \begin{align*}26^{th}\end{align*} values.
The median is a useful measure of average when the data set is highly skewed by a small number of points that are extremely large or extremely small. Such outliers will have a large effect on the mean, but will leave the median relatively unchanged.
Mode
The mode can be a useful measure of data when that data falls into a small number of categories. It is simply a measure of the most common number, or sometimes the most popular choice. The mode is an especially useful concept for data sets that contains nonnumerical information such as surveys of eye color, or favorite icecream flavor.
Example 5
Jim is helping to raise money at his church bake sale by doing face painting for children. He collects the ages of his customers, and displays the data in the histogram shown right. Find the mean, median and mode for the ages represented.
Solution
By reading the graph we can see that there was one 2yearold, three 3yearolds, four 4yearolds, etc... In total, there were:
\begin{align*}1 + 3 + 4 + 5 + 6 + 7 + 3 + 1=30\ \text{customers}.\end{align*}
The mean age is found by summing all the products of age and frequency, and dividing by 30:
\begin{align*}\text{Mean}&=\frac{(2 \cdot 1) + (3 \cdot 3) + (4\cdot 4) + (5 \cdot 5) + (6 \cdot 6) + (7 \cdot 7) + (8 \cdot 3) + (9 \cdot 1)}{30}\\ &=\frac{2 + 9 + 16 + 25 + 36 + 49 + 24 + 9}{30}=\frac{170}{30}=5 \frac{2}{3}\end{align*}
Since there are 30 children, the median is halfway between the \begin{align*}15^{th}\end{align*} and \begin{align*}16^{th}\end{align*} oldest (that way there will be 15 younger and 15 older). Both the \begin{align*}15^{th}\end{align*} and \begin{align*}16^{th}\end{align*} oldest fall in the 6yearold range, therefore
\begin{align*}\text{Median} =6\end{align*}
The mode is given by the age group with the highest frequency. Reading directly from the graph, we see:
\begin{align*}\text{Mode}=7\end{align*}
Multimedia Link The following video is an introduction to three measures of central tendency, mean, median, and mode. Khan Academy Statistics: The Average (12:34)
. The narrator models finding the mean, median, and mode of a set of numbers. While this is similar to the content above, some students may find this to be a helpful comparison of what the three measures of central tendency show.
Measures of Dispersion
Look at the graphs below. Each represents a collection of many data points and shows how the individual values (solid line) compare to the mean of the data set (dashed line). You can see that even though all three graphs have a common mean, the spread of the data differs from graph to graph. In statistics we use the word dispersion as a measure of how spread out the data is.
Range
Range is the simplest measure of dispersion. It is simply the total spread in the data, calculated by subtracting the smallest number in the group from the largest number.
Example 6
Find the range and the median of the following data.
\begin{align*}223, 121, 227, 433, 122, 193, 397, 276, 303, 199, 197, 265, 366, 401, 222\end{align*}
Solution
The first thing to do in this case is to order the data, listing all values in ascending order.
\begin{align*}121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433\end{align*}
Note: It is extremely important that all values are transferred to the second list. Two ways to ensure that you do this are (i) cross out the numbers in the original list as you order them in the second list, and (ii) count the number of values in both lists. In this example, both lists contain 15 values
The range is found by subtracting the lowest value from the highest.
\begin{align*}\text{Range}=\underline{433  122=311}\end{align*}
Once that the list is ordered, we can find the median from the 8th value.
\begin{align*}\text{Median}= 227\end{align*}
Variance
The range is not a particularly good measure of dispersion as it does not eliminate points that have unusually high or low values when compared to the rest of the data (the outliers). A better method involves measuring the distance each data point lies from a central average.
Look at the following data values.
\begin{align*}11, 13, 14, 15, 19, 22, 24, 26\end{align*}
We can see that the mean of these values is
\begin{align*} \frac{11 + 13 + 14 + 15 + 19 + 22 + 24 + 26}{8}=\frac{144}{8}=18 \end{align*}
The values all differ from the mean, but the amount they differ by varies. The difference between each number in the list and the mean (18) is in the following list.
\begin{align*}7, 5, 4, 3, 1, 4, 6, 8\end{align*}
This list shows the deviations from the mean. If find the mean of these deviations, we find that it is zero.
\begin{align*} \frac{7 + (5) + (4) + (3) + 1 + 4 + 6 + 8}{8}=\frac{0}{8}=0\end{align*}
This comes as no surprise. You can see that some of the values are positive and some are negative, as the mean lies somewhere near the middle of the range. You can use algebra to prove (try it!) that the sum of the deviations will always be zero, no matter what numbers are in the list. So, the sum of the deviations is not a useful tool for measuring variance.
We can, however, square the differences  thereby turning the negative differences into positive values. In that case we get the following list.
\begin{align*}49, 25, 16, 9, 1, 16, 36, 64\end{align*}
We can now proceed to find a mean of the squares of the deviations.
\begin{align*} \frac{49 + 25 + 16 + 9 + 1 + 16 + 36 + 64}{8}=\frac{216}{8}=27\end{align*}
We call this averaging of the square of the differences from the mean (the mean squared deviation) the variance. The variance is a measure of the dispersion and its value is lower for tightly grouped data than for widely spread data. In the example above, the variance is 27.
The population variance (symbol, \begin{align*}\sigma^{2}\end{align*}) can be calculated from the formula.
Variance
\begin{align*} \sigma^{2}=\frac{(x_1  \bar{x})^2 + (x_2  \bar{x})^2 + ... + (x_n  \bar{x})^2}{n}\end{align*}
What does it mean to say that tightly grouped data will have a low variance? You can probably already imagine that the size of the variance also depends on the size of the data itself. Below we see ways that mathematicians have tried to standardize the variance.
Standard Deviation
One of the most common measures of spread in statistical data is the standard deviation. You can see from the previous example that we do indeed get a measure of the spread of the data (you should hopefully see that tightly grouped data would have a smaller mean squared deviation and so a smaller variance) but it is not immediately clear what the number 27 refers to in the example above. Since it is the mean of the squares of the deviation, a logical step would be to take the square root. The root mean square (i.e. square root of the variance) is called the standard deviation, and is given the symbol \begin{align*}s\end{align*}.
Standard Deviation
The standard deviation of the set of \begin{align*}n\end{align*} numbers, \begin{align*}x_{1}, x_{2}\ldots x_{n}\end{align*} with a mean of \begin{align*}\bar{x}\end{align*} is given by the following.
\begin{align*} \sigma=\sqrt{\sigma^2}=\sqrt{\frac{(x_1  \bar{x})^2 + (x_2  \bar{x})^2 + ... + (x_n  \bar{x})^2}{n}}\end{align*}
Note: This formula is used for finding the standard deviation of a population, that is, the whole group of data you are interested in. There is an alternative formula for computing the standard deviation of a sample, or a smaller subset of the population.
Example 7
Find the mean, the variance and the standard deviation of the following values.
\begin{align*}121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433\end{align*}
Solution
The mean will be needed to find the variance, and from the variance we can determine the standard deviation. The mean is given by the following.
\begin{align*}\text{mean}&=\frac{121 + 122 + 193 + 197 + 199 + 222 + 223 + 227 + 265 + 276 + 303 + 366 + 397 + 401 + 433}{15}\\ \text{mean}&=\frac{3945}{15}=263.\end{align*}
The variance and standard deviation are often best calculated by constructing a table. Using this method, we enter the deviation and the square of the deviation for each separate data point, datum value.
Datum  Value  \begin{align*}(x_i  \bar{x})\end{align*}  \begin{align*}(x_i  \bar{x})^2\end{align*} 

\begin{align*}x_1\end{align*}  121  142  20, 164 
\begin{align*}x_2\end{align*}  122  141  19, 881 
\begin{align*}x_3\end{align*}  193  70  4, 900 
\begin{align*}x_4\end{align*}  197  66  4, 356 
\begin{align*}x_5\end{align*}  199  64  4, 096 
\begin{align*}x_6\end{align*}  222  41  1, 681 
\begin{align*}x_7\end{align*}  223  40  1, 600 
\begin{align*}x_8\end{align*}  227  36  1, 296 
\begin{align*}x_9\end{align*}  265  2  4 
\begin{align*}x_{10}\end{align*}  276  13  169 
\begin{align*}x_{11}\end{align*}  303  40  1, 600 
\begin{align*}x_{12}\end{align*}  366  103  10, 609 
\begin{align*}x_{13}\end{align*}  397  134  17, 956 
\begin{align*}x_{14}\end{align*}  401  138  19, 044 
\begin{align*}x_{15}\end{align*}  433  170  28, 900 
Sum  0  136,256 
The variance is thus given by
\begin{align*} \sigma^2=\frac{136,256}{15}=\underline{9083.733}.\end{align*}
The standard deviation is given by
\begin{align*}s=\sqrt{\sigma^2}=95.31.\end{align*}
If you look at the table, you will see that the standard deviation is a good measure of the spread. It looks to be a reasonable estimate of the average distance that each point lies from the mean.
Calculate and Interpret Measures of Central Tendency and Dispersion for RealWorld Situations
Example 8
A number of house sales in a town in Arizona are listed below. Calculate the mean and median house price. Also calculate the standard deviation in sale price
Mesa, Arizona
Address  Sale Price  Date Of Sale 

518 CLEVELAND AVE  $117, 424  12/28/2006 
1808 MARKESE AVE  $128, 000  1/10/2007 
1770 WHITE AVE  $132, 485  12/28/2006 
1459 LINCOLN AVE  $77, 900  1/4/2007 
1462 ANNE AVE  $60, 000  1/24/2007 
2414 DIX HWY  $250, 000  1/12/2007 
1523 ANNE AVE  $110, 205  1/8/2007 
1763 MARKESE AVE  $70, 000  12/19/2006 
1460 CLEVELAND AVE  $111, 710  12/11/2006 
1478 MILL ST  $102, 646  12/6/2006 
(Source: www.google.com)
Solution
We will first make a table, rewriting all sale prices in order. At the bottom, we will leave space to sum up not just the differences, but also the values. This will help to determine the mean.
Datum  Value ($)  \begin{align*}(x_i  \bar{x})\end{align*}  \begin{align*}(x_i  \bar{x})^2\end{align*}  

\begin{align*}x_1\end{align*}  60, 000  
\begin{align*}x_2\end{align*}  70, 000  
\begin{align*} x_3\end{align*}  77, 900  
\begin{align*} x_4\end{align*}  102, 646  
\begin{align*}x_5\end{align*}  110, 205  
\begin{align*}x_6\end{align*}  111, 710  
\begin{align*}x_7\end{align*}  117, 424  
\begin{align*}x_8\end{align*}  128, 000  
\begin{align*}x_9\end{align*}  132, 485  
\begin{align*}x_{10}\end{align*}  250, 000  
SUM:  10  1, 160, 370 
The mean can now be quickly calculated by dividing the sum of all sales values ($1, 160, 370) by the number of values (10).
\begin{align*}\text{mean}=\frac{\$ 1,160,370}{10}=\$ 116,037\end{align*}
Remember that the median is the \begin{align*} \left (\frac{n + 1}{2}\right)\end{align*} th value. Since \begin{align*} \left (\frac{n + 1}{2}\right)= 5.5\end{align*}, the median is the mean of \begin{align*}x_5\end{align*} and \begin{align*}x_6\end{align*}.
\begin{align*}\text{median}= \frac{\$ 110,205 + \$ 111,710}{2}=\$ 110,957.50\end{align*}
Since we found the mean, we can now proceed to fill in the remainder of the table.
Datum  Value ($)  \begin{align*} (x_i  \bar{x})\end{align*}  \begin{align*} (x_i  \bar{x})^2\end{align*}  

\begin{align*}x_1\end{align*}  60, 000  56037  3140145369  
\begin{align*}x_2\end{align*}  70, 000  46037  2119405369  
\begin{align*}x_3\end{align*}  77, 900  38137  1454430769  
\begin{align*}x_4\end{align*}  102, 646  13391  179318881  
\begin{align*}x_5\end{align*}  110, 205  5832  34012224  
\begin{align*}x_6\end{align*}  111, 710  4327  18722929  
\begin{align*}x_7\end{align*}  117, 424  1387  1923769  
\begin{align*}x_8\end{align*}  128, 000  11963  14311369  
\begin{align*}x_9\end{align*}  132, 485  16448  270536704  
\begin{align*}x_{10}\end{align*}  250, 000  133963  17946085369  
SUM  10  1, 160, 370  0  25178892752 
So the standard variation is given by
\begin{align*}\sigma=\sqrt{\frac{25178892752}{10}}\approx \$ 50,179\end{align*}
In this case, the mean and the median are close to each other, indicating that the house prices in this area of Mesa are spread fairly symmetrically about the mean. Although there is one house that is significantly more expensive than the others there are also a number that are cheaper to balance out the spread.
Example 9
James and John both own fields in which they plant cabbages. James plants cabbages by hand, while John uses a machine to carefully control the distance between the cabbages. The diameters of each grower’s cabbages are measured, and the results are shown in the table.
James  John  

Mean Diameter (inches)  7.10  6.85 
Standard Deviation (inches)  2.75  0.60 
John claims his method of machine planting is better. James insists it is better to plant by hand. Use the data to provide a reason to justify both sides of the argument.
Solution
 Jame’s cabbages have a larger mean diameter, and therefore on average they are larger than John’s. The larger standard deviation means that there will be a number of cabbages which are significantly bigger than the majority of John’s.
 John’s cabbages are, on average, smaller but only by a relatively small amount (one quarter inch). The smaller standard deviation means that the sizes of his cabbages are much more predictable. The spread of sizes is much less, so they all end up being closer to the mean. While he may not have many extra large cabbages, he will not have any that are excessively small either, which may be better for any stores to which he sells his cabbage.
Review Questions
 Find the median of the salaries given in Example 2.
 Find the mean, median and standard deviation of the following numbers. Which, of the mean and median, will give the best average? \begin{align*}15, 19, 15, 16, 11, 11, 18, 21, 165, 9, 11, 20, 16, 8, 17, 10, 12, 11, 16, 14\end{align*}
 Ten house sales in Encinitas, California are shown in the table below. Find the mean, median and standard deviation for the sale prices. Explain, using the data, why the median house price is most often used as a measure of the house prices in an area.
Address  Sale Price  Date Of Sale 

643 3RD ST  $1, 137, 000  6/5/2007 
911 CORNISH DR  $879, 000  6/5/2007 
911 ARDEN DR  $950, 000  6/13/2007 
715 S VULCAN AVE  $875, 000  4/30/2007 
510 4TH ST  $1, 499, 000  4/26/2007 
415 ARDEN DR  $875, 000  5/11/2007 
226 5TH ST  $4, 000, 000  5/3/2007 
710 3RD ST  $975, 000  3/13/2007 
68 LA VETA AVE  $796, 793  2/8/2007 
207 WEST D ST  $2, 100, 000  3/15/2007 
 Determine which average (mean, median or mode) would be most appropriate for the following.
 The life expectancy of storebought goldfish.
 The age in years of audience for a kids TV program.
 The weight of potato sacks that a store labels as “5 pound bag.”
 Two bus companies run services between Los Angeles and San Francisco. The mean journey times and standard deviation in the times are given below. If Samantha needs to travel between the cities which company should she choose if:
 She needs to catch a plane in San Francisco.
 She travels weekly to visit friends who live in San Francisco and wishes to minimize the time she spends on a bus over the entire year.
InterCal Express  Fastdog Travel  

Mean Time (hours)  9.5  8.75 
Standard Deviation (hours)  0.25  2.5 
Review Answers
 $44, 530
 \begin{align*}\text{Mean}=21.75, \text{Median}=15\end{align*}, and Standard \begin{align*}\text{Deviation}\approx 33.9.\end{align*} Because of the outlier (165) the median gives the better average.
 \begin{align*}\text{Mean}=\$1,408,679.30, \text{Median}=\$962,500,\end{align*} and Standard \begin{align*}\text{Deviation}\approx \$994,311.10.\end{align*} Because there will often be a few very expensive houses (for example $4 million), the median is better.

Answers will vary, these are sample answers.
 Median  Some goldfish may live for many years, a few may die in a matter of days.
 Mode  The target audience may be, for example, 4 year olds but parents and older siblings may swing other averages.
 Mean  This has the added advantage of predicting what a large number of bags would weigh. The median (or even mode) would also be useful if the student could justify the answer.
 Since she wants to catch a plane, the most predictable company would be best. The smaller standard deviation for InterCal means the chances of unexpected delays is smaller.
 For a large number of journeys, total time on the bus is approximately the average journey time multiplied by the number of journeys. Fastdog would minimize overall journey time.