This Concept is an overview of some of the basic statistics used to measure the center of a set of data.

### Watch This

For an explanation and examples of mean, median and mode, see keithpeterb, Mean, Mode and Median from Frequency Tables (7:06).

### Guidance

The students in a statistics class were asked to report the number of children that live in their house (including brothers and sisters temporarily away at college). The data are recorded below:

1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6

Once data are collected, it is useful to summarize the data set by identifying a value around which the data are centered. Three commonly used measures of center are the mode, the median, and the mean.

**Mode**

The *mode* is defined as the most frequently occurring number in a data set. The mode is most useful in situations that involve categorical (qualitative) data that are measured at the nominal level. In the last chapter, we referred to the data with the Galapagos tortoises and noted that the variable 'Climate Type' was such a measurement. For this example, the mode is the value 'humid'.

#### Example A

*Find the mode for the number of children per house in the data set at the beginning of the Concept.*

**Solution:**

In this case, 2 is the mode, as it is the most frequently occurring number of children in the sample, telling us that most students in the class come from families where there are 2 children.

In this example, the mode could be a useful statistic that would tell us something about the families of statistics students in our school.

**More Than One Mode**

If there were seven 3-child households and seven 2-child households, we would say the data set has two modes. In other words, the data would be *bimodal*. When a data set is described as being bimodal, it is clustered about two different modes. Technically, if there were more than two, they would all be the mode. However, the more of them there are, the more trivial the mode becomes. In these cases, we would most likely search for a different statistic to describe the center of such data.

If there is an equal number of each data value, the mode is not useful in helping us understand the data, and thus, we say the data set has no mode.

**Mean**

Another measure of central tendency is the arithmetic average, or *mean*. This value is calculated by adding all the data values and dividing the sum by the total number of data points. The mean is the numerical balancing point of the data set.

We can illustrate this physical interpretation of the mean. Below is a graph of the class data from the last example.

If you have snap cubes like you used to use in elementary school, you can make a physical model of the graph, using one cube to represent each student’s family and a row of six cubes at the bottom to hold them together, like this:

#### Example B

*Find the mean for the number of children per house.*

**Solution:**

There are 22 students in this class, and the total number of children in all of their houses is 55, so the mean of this data is \begin{align*}\frac{55}{22}=2.5\end{align*}.

It turns out that the model that you created balances at 2.5. In the pictures below, you can see that a block placed at 3 causes the graph to tip left, while one placed at 2 causes the graph to tip right. However, if you place the block at 2.5, it balances perfectly!

Statisticians use the symbol \begin{align*}\overline{x}\end{align*} to represent the mean when \begin{align*}x\end{align*} is the symbol for a single measurement. Read \begin{align*}\overline{x}\end{align*} as “\begin{align*}x\end{align*} bar.”

Symbolically, the formula for the sample mean is as follows:

\begin{align*}\overline{x}= \frac{\sum_{i=1}^n x_i}{n} = \frac{x_1+x_2+\ldots+x_n}{n}\end{align*}

where:

\begin{align*}x_i\end{align*} is the \begin{align*}i^{\text{th}}\end{align*} data value of the sample.

\begin{align*}n\end{align*} is the sample size.

The mean of the population is denoted by the Greek letter, \begin{align*}\mu\end{align*}.

\begin{align*}\overline{x}\end{align*} is a statistic, since it is a measure of a sample, and \begin{align*}\mu\end{align*} is a parameter, since it is a measure of a population. \begin{align*}\overline{x}\end{align*} is an estimate of \begin{align*}\mu\end{align*}.

**Median**

The *median* is simply the middle number in an ordered set of data.

Suppose a student took five statistics quizzes and received the following grades:

80, 94, 75, 96, 90

To find the median, you must put the data in order. The median will be the data point that is in the middle. Placing the data in order from least to greatest yields: 75, 80, 90, 94, 96.

The middle number in this case is the third grade, or 90, so the median of this data is 90.

When there is an even number of numbers, no one of the data points will be in the middle. In this case, we take the average (mean) of the two middle numbers.

#### Example C

Consider the following quiz scores: 91, 83, 97, 89

Place them in numeric order: 83, 89, 91, 97.

The second and third numbers straddle the middle of this set. The mean of these two numbers is 90, so the median of the data is 90.

**Mean vs. Median**

Both the mean and the median are important and widely used measures of center. Consider the following example: Suppose you got an 85 and a 93 on your first two statistics quizzes, but then you had a really bad day and got a 14 on your next quiz!

The mean of your three grades would be 64. Which is a better measure of your performance? As you can see, the middle number in the set is an 85. That middle does not change if the lowest grade is an 84, or if the lowest grade is a 14. However, when you add the three numbers to find the mean, the sum will be much smaller if the lowest grade is a 14.

**Outliers and Resistance**

The mean and the median are so different in this example because there is one grade that is extremely different from the rest of the data. In statistics, we call such extreme values *outliers*. The mean is affected by the presence of an outlier; however, the median is not. A statistic that is not affected by outliers is called *resistant*. We say that the median is a resistant measure of center, and the mean is not resistant. In a sense, the median is able to resist the pull of a far away value, but the mean is drawn to such values. It cannot resist the influence of outlier values. As a result, when we have a data set that contains an outlier, it is often better to use the median to describe the center, rather than the mean.

#### Example D

In 2005, the CEO of Yahoo, Terry Semel, was paid almost $231,000,000 (see http://www.forbes.com/static/execpay2005/rank.html). This is certainly not typical of what the average worker at Yahoo could expect to make. Instead of using the mean salary to describe how Yahoo pays its employees, it would be more appropriate to use the median salary of all the employees.

You will often see medians used to describe the typical value of houses in a given area, as the presence of a very few extremely large and expensive homes could make the mean appear misleadingly large.

*On the Web*

http://edhelper.com/statistics.htm

http://en.wikipedia.org/wiki/Arithmetic_mean

Java Applets helpful to understand the relationship between the mean and the median:

http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html

http://www.shodor.org/interactivate/activities/PlopIt/

### Guided Practice

*The mean of 6 people in a room is 35 years. A 40- year- old person comes in. What is now the mean age of the people in the room?*

**Solution:**

We will start by using the definition of the mean:

\begin{align*}\overline{x}=\frac{\Sigma x}{n}.\end{align*}

Since we know the mean is 35, and that \begin{align*}n=6\end{align*}, so we can substitute these into the equation:

\begin{align*}35=\frac{\Sigma x}{6} \Rightarrow \Sigma x=6 \cdot 35=210. \end{align*}

When a new person of age 40 enters the room the total becomes 210 + 40 = 250. We find the average by dividing by 7. The average age is now 35.7 years.

### Explore More

- In Lois’ 2
^{nd}grade class, all of the students are between 45 and 52 inches tall, except one boy, Lucas, who is 62 inches tall. Which of the following statements is true about the heights of all of the students?- The mean height and the median height are about the same.
- The mean height is greater than the median height.
- The mean height is less than the median height.
- More information is needed to answer this question.
- None of the above is true.

- Enrique has a 91, 87, and 95 for his statistics grades for the first three quarters. His mean grade for the year must be a 93 in order for him to be exempt from taking the final exam. Assuming grades are rounded following valid mathematical procedures, what is the lowest whole number grade he can get for the \begin{align*}4^{\text{th}}\end{align*} quarter and still be exempt from taking the exam?
- How many data points should be removed from each end of a sample of 300 values in order to calculate a 10% trimmed mean?
- 5
- 10
- 15
- 20
- 30

- In the last example, after removing the correct numbers and summing those remaining, what would you divide by to calculate the mean?
- The chart below shows the data from the Galapagos tortoise preservation program with just the number of individual tortoises that were bred in captivity and reintroduced into their native habitat.

Island or Volcano |
Number of Individuals Repatriated |
---|---|

Wolf | 40 |

Darwin | 0 |

Alcedo | 0 |

Sierra Negra | 286 |

Cerro Azul | 357 |

Santa Cruz | 210 |

Española | 1293 |

San Cristóbal | 55 |

Santiago | 498 |

Pinzón | 552 |

Pinta | 0 |

**Figure:** Approximate Distribution of Giant Galapagos Tortoises in 2004 (“Estado Actual De Las Poblaciones de Tortugas Terrestres Gigantes en las Islas Galápagos,” Marquez, Wiedenfeld, Snell, Fritts, MacFarland, Tapia, y Nanjoa, Scologia Aplicada, Vol. 3, Num. 1,2, pp. 98-11).

For this data, calculate each of the following:

(a) mode

(b) median

(c) mean

(d) a 10% trimmed mean

(e) midrange

(f) upper and lower quartiles

(g) the percentile for the number of Santiago tortoises reintroduced

- In the previous question, why is the answer to (c) significantly higher than the answer to (b)?
- The mean of 10 scores is 12.6. What is the sum of the scores?
- While on vacation John drove an average of 262 miles per day for a period of 12 days. How far did John drive in total while he was on vacation?
- Find x if 5, 9, 11, 12, 13, 14, 15 and x have a mean of 13.
- Find a given that 3, 0, a, a, 4, a, 6, a, and 3 have a mean of 4.
- A sample of 10 measurements has a mean of 15.6 and a sample of 20 measurements has a mean of 13.2. Find the mean of all 30 measurements.
- The table below shows the results when 3 coins were tossed simultaneously 30 times. The number of tails appearing was recorded. Calculate the:
- Mode
- Median
- Mean

Number of Tails |
Number of times occurred |
---|---|

3 | 4 |

2 | 12 |

1 | 11 |

0 | 3 |

Total | 30 |

- Compute the mean, the median and the mode for each of the following sets of numbers:
- 3, 16, 3, 9, 5, 7, 11
- 5, 3, 3, 7, 5, 5, 16, 9, 3, 18, 11, 5, 3, 7
- 7, -4, 0, 12, 8, 121, -3

- Find the mean and the median for each of the list of values:
- 65, 69, 73, 77, 81, 87
- 11, 7, 3, 8, 101
- 31, 11, 41, 31

- Find the mean and median for each of the following datasets:
- 65, 66, 71, 75, 81, 85
- 11, 7, 1, 7, 99
- 31, 11, 41, 31

- Explain why there is such a large difference between the median and the mean in the dataset of part b in the previous question
- How do you determine which measure of center best describes a particular data set?

*Technology Notes:*

*Calculating the Mean on the TI-83/84 Graphing Calculator*

Step 1: Entering the data

On the home screen, press **[2ND][{]**, and then enter the following data separated by commas. When you have entered all the data, press **[2ND][}][STO][2ND][L1][ENTER]**. You will see the screen on the left below:

1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6

Step 2: Computing the mean

On the home screen, press **[2ND][LIST]** to enter the **LIST** menu, press the right arrow twice to go to the **MATH** menu (the middle screen above), and either arrow down and press **[ENTER]** or press **[3]** for the mean. Finally, press **[2ND][L1][)]** to insert **L1** and press **[ENTER]** (see the screen on the right above).

*Calculating Weighted Means on the TI-83/84 Graphing Calculator*

Use the data of the number of children in a family. In list **L1**, enter the number of children, and in list **L2**, enter the frequencies, or weights.

The data should be entered as shown in the left screen below:

Press **[2ND][STAT]** to enter the **LIST** menu, press the right arrow twice to go to the **MATH** menu (the middle screen above), and either arrow down and press **[ENTER]** or press **[3]** for the mean. Finally, press **[2ND][L1][,][2ND][L2][)][ENTER]**, and you will see the screen on the right above. Note that the mean is 2.5, as before.