<meta http-equiv="refresh" content="1; url=/nojavascript/"> Measures of Central Tendency and Dispersion | CK-12 Foundation
You are reading an older version of this FlexBook® textbook: CK-12 Algebra I Go to the latest version.

# 11.6: Measures of Central Tendency and Dispersion

Created by: CK-12
0  0  0

## Learning Objectives

• Compare measures of central tendency.
• Measure the dispersion of a collection of data.
• Calculate and interpret measures of central tendency and dispersion for real-world situations.

## Comparing Measures of Central Tendency

The word “average” is often used to describe something that is used to represent the general characteristics of a larger group of unequal objects. Mathematically, an average is a single number which can be used to summarize a collection of numerical values. In mathematics, there are several types of “averages” with the most common being the mean, the median and the mode.

Mean

The arithmetic mean of a group of numbers is found by dividing the sum of the numbers by the number of values in the group. In other words, we add all the numbers together and divide by the number of numbers.

Example 1

Find the mean of the numbers 11, 16, 9, 15, 5, 18

Solution

There are six separate numbers, so we find the mean with the following.

$\text{mean}=\frac{11+ 16 + 9 + 15 + 5 + 18}{6}=\frac{74}{6}=12 \frac{1}{3}.$

The arithmetic mean is what most people automatically think of when the word average is used with numbers. It is generally a good way to take an average, but suffers when a small number of the values lie significantly far away from the majority of the rest. A classic example would be when calculating average income. If one person (such as Former Microsoft Corporation chairman Bill Gates) earns a great deal more than everyone else who is surveyed, then one value can sway the mean significantly away from what the majority of people earn.

Example 2

The annual incomes for 8 professions are shown below. Form the data, calculate the mean annual income of the 8 professions.

Professional Realm Annual income
Farming, Fishing, and Forestry $19, 630 Sales and Related$28, 920
Architecture and Engineering $56, 330 Healthcare Practitioners$49, 930
Legal $69, 030 Teaching & Education$39, 130
Construction $35, 460 Professional Baseball Player*$2, 476, 590

(Source: Bureau of Labor Statistics, except (*) - The Baseball Players' Association (playbpa.com)).

Solution

There are 8 values listed so we find the mean as follows.

$\text{mean }&=\frac{\(19630 + 28920 + 56330 + 49930 + 69030 + 39130 + 35460 + 2476590)}{8}\\&=\346,877.50$

As you can see, the mean annual income is substantially larger than 7 out of the 8 professions. The effect of the single outlier (the baseball player) has a dramatic effect on the mean, so the mean is not a good method for representing the ‘average’ salary in this case.

Algebraic Formula for the Mean.

If we have a number of values such as 11, 16, 9, 15, 5, 18 we may label them as follows.

Position in Sequence Label Value
$1^{st}$ $x_1$ 11
$2^{nd}$ $x_2$ 16
$3^{rd}$ $x_3$ 9
$4^{th}$ $x_4$ 15
$5^{th}$ $x_5$ 5
$6^{th}$ $x_6$ 18

We can see from the table that $x_1=11, x_2=16, x_3=9,$ etc... If we also say that the number of terms $= n$, then just as $x_1$ is the first term, $x_n$ is the last term. We can now define the mean (given the symbol $\bar{x}$) as

Arithmetic mean

$\bar{x}=\frac{x_1 + x_2 + x_3 + \ldots + x_n}{n}$

Median

The median is another type of average. It is defined as the value in the middle of a group of numbers. To find the median, we must first list all numbers in order from least to greatest.

Example 3

Find the median of the numbers 11, 21, 6, 17, 9.

Solution:

We first list the numbers in ascending order.

6, 9, 11, 17, 21

The median is the value in the middle of the set (in bold).

The median is 11. There are two values higher than 11 and two values lower than 11.

If there is an even number of values then the median is taken as the arithmetic mean of the two numbers in the middle.

Example 4

Find the median of the numbers 2, 17, 1, -3, 12, 8, 12, 16

Solution:

We first list the numbers in ascending order.

-3, 1, 2, 8, 12, 12, 16, 17

The median is the value in the middle of the set, and lies between 8 and 12:

$\text{median}=\frac{8 + 12}{2}=\frac{20}{2}= 10.$

The median is 10. Four values are lower than 10, four values are higher than 10.

If you look again at the two previous examples, you will see that when we had 5 values, the median was the $3^{rd}$ term. With 8 values, the median was half way between the $4^{th}$ and $5^{th}$ values. In general, with a total of $n$ values, the median is the $\left (\frac{n + 1}{2}\right)^{th}$ value. When the quantity $\left (\frac{n + 1}{2}\right)$ is fractional, it indicates that the median is the mean of two data points. For example with 15 ordered data points, the median would be the $\left (\frac{15 + 1}{2}\right)=8^{th}$ value. For 50 data points the quantity $\left (\frac{n + 1}{2}\right)= 25.5$ indicating that the median is given by taking the arithmetic mean of the $25^{th}$ and $26^{th}$ values.

The median is a useful measure of average when the data set is highly skewed by a small number of points that are extremely large or extremely small. Such outliers will have a large effect on the mean, but will leave the median relatively unchanged.

Mode

The mode can be a useful measure of data when that data falls into a small number of categories. It is simply a measure of the most common number, or sometimes the most popular choice. The mode is an especially useful concept for data sets that contains non-numerical information such as surveys of eye color, or favorite ice-cream flavor.

Example 5

Jim is helping to raise money at his church bake sale by doing face painting for children. He collects the ages of his customers, and displays the data in the histogram shown right. Find the mean, median and mode for the ages represented.

Solution

By reading the graph we can see that there was one 2-year-old, three 3-year-olds, four 4-year-olds, etc... In total, there were:

$1 + 3 + 4 + 5 + 6 + 7 + 3 + 1=30\ \text{customers}.$

The mean age is found by summing all the products of age and frequency, and dividing by 30:

$\text{Mean}&=\frac{(2 \cdot 1) + (3 \cdot 3) + (4\cdot 4) + (5 \cdot 5) + (6 \cdot 6) + (7 \cdot 7) + (8 \cdot 3) + (9 \cdot 1)}{30}\\&=\frac{2 + 9 + 16 + 25 + 36 + 49 + 24 + 9}{30}=\frac{170}{30}=5 \frac{2}{3}$

Since there are 30 children, the median is half-way between the $15^{th}$ and $16^{th}$ oldest (that way there will be 15 younger and 15 older). Both the $15^{th}$ and $16^{th}$ oldest fall in the 6-year-old range, therefore

$\text{Median} =6$

The mode is given by the age group with the highest frequency. Reading directly from the graph, we see:

$\text{Mode}=7$

Multimedia Link The following video is an introduction to three measures of central tendency, mean, median, and mode. Khan Academy Statistics: The Average (12:34)

. The narrator models finding the mean, median, and mode of a set of numbers. While this is similar to the content above, some students may find this to be a helpful comparison of what the three measures of central tendency show.

## Measures of Dispersion

Look at the graphs below. Each represents a collection of many data points and shows how the individual values (solid line) compare to the mean of the data set (dashed line). You can see that even though all three graphs have a common mean, the spread of the data differs from graph to graph. In statistics we use the word dispersion as a measure of how spread out the data is.

Range

Range is the simplest measure of dispersion. It is simply the total spread in the data, calculated by subtracting the smallest number in the group from the largest number.

Example 6

Find the range and the median of the following data.

$223, 121, 227, 433, 122, 193, 397, 276, 303, 199, 197, 265, 366, 401, 222$

Solution

The first thing to do in this case is to order the data, listing all values in ascending order.

$121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433$

Note: It is extremely important that all values are transferred to the second list. Two ways to ensure that you do this are (i) cross out the numbers in the original list as you order them in the second list, and (ii) count the number of values in both lists. In this example, both lists contain 15 values

The range is found by subtracting the lowest value from the highest.

$\text{Range}=\underline{433 - 122=311}$

Once that the list is ordered, we can find the median from the 8th value.

$\text{Median}= 227$

Variance

The range is not a particularly good measure of dispersion as it does not eliminate points that have unusually high or low values when compared to the rest of the data (the outliers). A better method involves measuring the distance each data point lies from a central average.

Look at the following data values.

$11, 13, 14, 15, 19, 22, 24, 26$

We can see that the mean of these values is

$\frac{11 + 13 + 14 + 15 + 19 + 22 + 24 + 26}{8}=\frac{144}{8}=18$

The values all differ from the mean, but the amount they differ by varies. The difference between each number in the list and the mean (18) is in the following list.

$-7, -5, -4, -3, 1, 4, 6, 8$

This list shows the deviations from the mean. If find the mean of these deviations, we find that it is zero.

$\frac{-7 + (-5) + (-4) + (-3) + 1 + 4 + 6 + 8}{8}=\frac{0}{8}=0$

This comes as no surprise. You can see that some of the values are positive and some are negative, as the mean lies somewhere near the middle of the range. You can use algebra to prove (try it!) that the sum of the deviations will always be zero, no matter what numbers are in the list. So, the sum of the deviations is not a useful tool for measuring variance.

We can, however, square the differences - thereby turning the negative differences into positive values. In that case we get the following list.

$49, 25, 16, 9, 1, 16, 36, 64$

We can now proceed to find a mean of the squares of the deviations.

$\frac{49 + 25 + 16 + 9 + 1 + 16 + 36 + 64}{8}=\frac{216}{8}=27$

We call this averaging of the square of the differences from the mean (the mean squared deviation) the variance. The variance is a measure of the dispersion and its value is lower for tightly grouped data than for widely spread data. In the example above, the variance is 27.

The population variance (symbol, $\sigma^{2}$) can be calculated from the formula.

Variance

$\sigma^{2}=\frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_n - \bar{x})^2}{n}$

What does it mean to say that tightly grouped data will have a low variance? You can probably already imagine that the size of the variance also depends on the size of the data itself. Below we see ways that mathematicians have tried to standardize the variance.

Standard Deviation

One of the most common measures of spread in statistical data is the standard deviation. You can see from the previous example that we do indeed get a measure of the spread of the data (you should hopefully see that tightly grouped data would have a smaller mean squared deviation and so a smaller variance) but it is not immediately clear what the number 27 refers to in the example above. Since it is the mean of the squares of the deviation, a logical step would be to take the square root. The root mean square (i.e. square root of the variance) is called the standard deviation, and is given the symbol $s$.

Standard Deviation

The standard deviation of the set of $n$ numbers, $x_{1}, x_{2}\ldots x_{n}$ with a mean of $\bar{x}$ is given by the following.

$\sigma=\sqrt{\sigma^2}=\sqrt{\frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_n - \bar{x})^2}{n}}$

Note: This formula is used for finding the standard deviation of a population, that is, the whole group of data you are interested in. There is an alternative formula for computing the standard deviation of a sample, or a smaller subset of the population.

Example 7

Find the mean, the variance and the standard deviation of the following values.

$121, 122, 193, 197, 199, 222, 223, 227, 265, 276, 303, 366, 397, 401, 433$

Solution

The mean will be needed to find the variance, and from the variance we can determine the standard deviation. The mean is given by the following.

$\text{mean}&=\frac{121 + 122 + 193 + 197 + 199 + 222 + 223 + 227 + 265 + 276 + 303 + 366 + 397 + 401 + 433}{15}\\\text{mean}&=\frac{3945}{15}=263.$

The variance and standard deviation are often best calculated by constructing a table. Using this method, we enter the deviation and the square of the deviation for each separate data point, datum value.

Datum Value $(x_i - \bar{x})$ $(x_i - \bar{x})^2$
$x_1$ 121 -142 20, 164
$x_2$ 122 -141 19, 881
$x_3$ 193 -70 4, 900
$x_4$ 197 -66 4, 356
$x_5$ 199 -64 4, 096
$x_6$ 222 -41 1, 681
$x_7$ 223 -40 1, 600
$x_8$ 227 -36 1, 296
$x_9$ 265 2 4
$x_{10}$ 276 13 169
$x_{11}$ 303 40 1, 600
$x_{12}$ 366 103 10, 609
$x_{13}$ 397 134 17, 956
$x_{14}$ 401 138 19, 044
$x_{15}$ 433 170 28, 900
Sum 0 136,256

The variance is thus given by

$\sigma^2=\frac{136,256}{15}=\underline{9083.733}.$

The standard deviation is given by

$s=\sqrt{\sigma^2}=95.31.$

If you look at the table, you will see that the standard deviation is a good measure of the spread. It looks to be a reasonable estimate of the average distance that each point lies from the mean.

## Calculate and Interpret Measures of Central Tendency and Dispersion for Real-World Situations

Example 8

A number of house sales in a town in Arizona are listed below. Calculate the mean and median house price. Also calculate the standard deviation in sale price

Mesa, Arizona

Address Sale Price Date Of Sale
518 CLEVELAND AVE $117, 424 12/28/2006 1808 MARKESE AVE$128, 000 1/10/2007
1770 WHITE AVE $132, 485 12/28/2006 1459 LINCOLN AVE$77, 900 1/4/2007
1462 ANNE AVE $60, 000 1/24/2007 2414 DIX HWY$250, 000 1/12/2007
1523 ANNE AVE $110, 205 1/8/2007 1763 MARKESE AVE$70, 000 12/19/2006
1460 CLEVELAND AVE $111, 710 12/11/2006 1478 MILL ST$102, 646 12/6/2006

Solution

We will first make a table, rewriting all sale prices in order. At the bottom, we will leave space to sum up not just the differences, but also the values. This will help to determine the mean.

Datum Value ($) $(x_i - \bar{x})$ $(x_i - \bar{x})^2$ $x_1$ 60, 000 $x_2$ 70, 000 $x_3$ 77, 900 $x_4$ 102, 646 $x_5$ 110, 205 $x_6$ 111, 710 $x_7$ 117, 424 $x_8$ 128, 000 $x_9$ 132, 485 $x_{10}$ 250, 000 SUM: 10 1, 160, 370 The mean can now be quickly calculated by dividing the sum of all sales values ($1, 160, 370) by the number of values (10).

$\text{mean}=\frac{\ 1,160,370}{10}=\ 116,037$

Remember that the median is the $\left (\frac{n + 1}{2}\right)$ th value. Since $\left (\frac{n + 1}{2}\right)= 5.5$, the median is the mean of $x_5$ and $x_6$.

$\text{median}= \frac{\ 110,205 + \ 111,710}{2}=\ 110,957.50$

Since we found the mean, we can now proceed to fill in the remainder of the table.

Datum Value ($) $(x_i - \bar{x})$ $(x_i - \bar{x})^2$ $x_1$ 60, 000 56037 3140145369 $x_2$ 70, 000 -46037 2119405369 $x_3$ 77, 900 -38137 1454430769 $x_4$ 102, 646 -13391 179318881 $x_5$ 110, 205 -5832 34012224 $x_6$ 111, 710 -4327 18722929 $x_7$ 117, 424 1387 1923769 $x_8$ 128, 000 11963 14311369 $x_9$ 132, 485 16448 270536704 $x_{10}$ 250, 000 133963 17946085369 SUM 10 1, 160, 370 0 25178892752 So the standard variation is given by $\sigma=\sqrt{\frac{25178892752}{10}}\approx \ 50,179$ In this case, the mean and the median are close to each other, indicating that the house prices in this area of Mesa are spread fairly symmetrically about the mean. Although there is one house that is significantly more expensive than the others there are also a number that are cheaper to balance out the spread. Example 9 James and John both own fields in which they plant cabbages. James plants cabbages by hand, while John uses a machine to carefully control the distance between the cabbages. The diameters of each grower’s cabbages are measured, and the results are shown in the table. James John Mean Diameter (inches) 7.10 6.85 Standard Deviation (inches) 2.75 0.60 John claims his method of machine planting is better. James insists it is better to plant by hand. Use the data to provide a reason to justify both sides of the argument. Solution • Jame’s cabbages have a larger mean diameter, and therefore on average they are larger than John’s. The larger standard deviation means that there will be a number of cabbages which are significantly bigger than the majority of John’s. • John’s cabbages are, on average, smaller but only by a relatively small amount (one quarter inch). The smaller standard deviation means that the sizes of his cabbages are much more predictable. The spread of sizes is much less, so they all end up being closer to the mean. While he may not have many extra large cabbages, he will not have any that are excessively small either, which may be better for any stores to which he sells his cabbage. ## Review Questions 1. Find the median of the salaries given in Example 2. 2. Find the mean, median and standard deviation of the following numbers. Which, of the mean and median, will give the best average? $15, 19, 15, 16, 11, 11, 18, 21, 165, 9, 11, 20, 16, 8, 17, 10, 12, 11, 16, 14$ 3. Ten house sales in Encinitas, California are shown in the table below. Find the mean, median and standard deviation for the sale prices. Explain, using the data, why the median house price is most often used as a measure of the house prices in an area. Address Sale Price Date Of Sale 643 3RD ST$1, 137, 000 6/5/2007
911 CORNISH DR $879, 000 6/5/2007 911 ARDEN DR$950, 000 6/13/2007
715 S VULCAN AVE $875, 000 4/30/2007 510 4TH ST$1, 499, 000 4/26/2007
415 ARDEN DR $875, 000 5/11/2007 226 5TH ST$4, 000, 000 5/3/2007
710 3RD ST $975, 000 3/13/2007 68 LA VETA AVE$796, 793 2/8/2007
207 WEST D ST $2, 100, 000 3/15/2007 1. Determine which average (mean, median or mode) would be most appropriate for the following. 1. The life expectancy of store-bought goldfish. 2. The age in years of audience for a kids TV program. 3. The weight of potato sacks that a store labels as “5 pound bag.” 2. Two bus companies run services between Los Angeles and San Francisco. The mean journey times and standard deviation in the times are given below. If Samantha needs to travel between the cities which company should she choose if: 1. She needs to catch a plane in San Francisco. 2. She travels weekly to visit friends who live in San Francisco and wishes to minimize the time she spends on a bus over the entire year. Inter-Cal Express Fast-dog Travel Mean Time (hours) 9.5 8.75 Standard Deviation (hours) 0.25 2.5 ## Review Answers 1.$44, 530
2. $\text{Mean}=21.75, \text{Median}=15$, and Standard $\text{Deviation}\approx 33.9.$ Because of the outlier (165) the median gives the better average.
3. $\text{Mean}=\1,408,679.30, \text{Median}=\962,500,$ and Standard $\text{Deviation}\approx \994,311.10.$ Because there will often be a few very expensive houses (for example \$4 million), the median is better.
4. Answers will vary, these are sample answers.
1. Median - Some goldfish may live for many years, a few may die in a matter of days.
2. Mode - The target audience may be, for example, 4 year olds but parents and older siblings may swing other averages.
3. Mean - This has the added advantage of predicting what a large number of bags would weigh. The median (or even mode) would also be useful if the student could justify the answer.
1. Since she wants to catch a plane, the most predictable company would be best. The smaller standard deviation for InterCal means the chances of unexpected delays is smaller.
2. For a large number of journeys, total time on the bus is approximately the average journey time multiplied by the number of journeys. Fast-dog would minimize overall journey time.

## Date Created:

Feb 22, 2012

Aug 26, 2014
Files can only be attached to the latest version of None