This Concept is an overview of some of the basic statistics used to measure the center of a set of data.
Watch This
For an explanation and examples of mean, median and mode, see keithpeterb, Mean, Mode and Median from Frequency Tables (7:06).
Guidance
The students in a statistics class were asked to report the number of children that live in their house (including brothers and sisters temporarily away at college). The data are recorded below:
1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6
Once data are collected, it is useful to summarize the data set by identifying a value around which the data are centered. Three commonly used measures of center are the mode, the median, and the mean.
Mode
The mode is defined as the most frequently occurring number in a data set. The mode is most useful in situations that involve categorical (qualitative) data that are measured at the nominal level. In the last chapter, we referred to the data with the Galapagos tortoises and noted that the variable 'Climate Type' was such a measurement. For this example, the mode is the value 'humid'.
Example A
Find the mode for the number of children per house in the data set at the beginning of the Concept.
Solution:
In this case, 2 is the mode, as it is the most frequently occurring number of children in the sample, telling us that most students in the class come from families where there are 2 children.
In this example, the mode could be a useful statistic that would tell us something about the families of statistics students in our school.
More Than One Mode
If there were seven 3child households and seven 2child households, we would say the data set has two modes. In other words, the data would be bimodal . When a data set is described as being bimodal, it is clustered about two different modes. Technically, if there were more than two, they would all be the mode. However, the more of them there are, the more trivial the mode becomes. In these cases, we would most likely search for a different statistic to describe the center of such data.
If there is an equal number of each data value, the mode is not useful in helping us understand the data, and thus, we say the data set has no mode.
Mean
Another measure of central tendency is the arithmetic average, or mean . This value is calculated by adding all the data values and dividing the sum by the total number of data points. The mean is the numerical balancing point of the data set.
We can illustrate this physical interpretation of the mean. Below is a graph of the class data from the last example.
If you have snap cubes like you used to use in elementary school, you can make a physical model of the graph, using one cube to represent each student’s family and a row of six cubes at the bottom to hold them together, like this:
Example B
Find the mean for the number of children per house.
Solution:
There are 22 students in this class, and the total number of children in all of their houses is 55, so the mean of this data is \begin{align*}\frac{55}{22}=2.5\end{align*} .
It turns out that the model that you created balances at 2.5. In the pictures below, you can see that a block placed at 3 causes the graph to tip left, while one placed at 2 causes the graph to tip right. However, if you place the block at 2.5, it balances perfectly!
Statisticians use the symbol \begin{align*}\overline{x}\end{align*} to represent the mean when \begin{align*}x\end{align*} is the symbol for a single measurement. Read \begin{align*}\overline{x}\end{align*} as “ \begin{align*}x\end{align*} bar.”
Symbolically, the formula for the sample mean is as follows:
\begin{align*}\overline{x}= \frac{\sum_{i=1}^n x_i}{n} = \frac{x_1+x_2+\ldots+x_n}{n}\end{align*}
where:
\begin{align*}x_i\end{align*} is the \begin{align*}i^{\text{th}}\end{align*} data value of the sample.
\begin{align*}n\end{align*} is the sample size.
The mean of the population is denoted by the Greek letter, \begin{align*}\mu\end{align*} .
\begin{align*}\overline{x}\end{align*} is a statistic, since it is a measure of a sample, and \begin{align*}\mu\end{align*} is a parameter, since it is a measure of a population. \begin{align*}\overline{x}\end{align*} is an estimate of \begin{align*}\mu\end{align*} .
Median
The median is simply the middle number in an ordered set of data.
Suppose a student took five statistics quizzes and received the following grades:
80, 94, 75, 96, 90
To find the median, you must put the data in order. The median will be the data point that is in the middle. Placing the data in order from least to greatest yields: 75, 80, 90, 94, 96.
The middle number in this case is the third grade, or 90, so the median of this data is 90.
When there is an even number of numbers, no one of the data points will be in the middle. In this case, we take the average (mean) of the two middle numbers.
Example C
Consider the following quiz scores: 91, 83, 97, 89
Place them in numeric order: 83, 89, 91, 97.
The second and third numbers straddle the middle of this set. The mean of these two numbers is 90, so the median of the data is 90.
Mean vs. Median
Both the mean and the median are important and widely used measures of center. Consider the following example: Suppose you got an 85 and a 93 on your first two statistics quizzes, but then you had a really bad day and got a 14 on your next quiz!
The mean of your three grades would be 64. Which is a better measure of your performance? As you can see, the middle number in the set is an 85. That middle does not change if the lowest grade is an 84, or if the lowest grade is a 14. However, when you add the three numbers to find the mean, the sum will be much smaller if the lowest grade is a 14.
Outliers and Resistance
The mean and the median are so different in this example because there is one grade that is extremely different from the rest of the data. In statistics, we call such extreme values outliers . The mean is affected by the presence of an outlier; however, the median is not. A statistic that is not affected by outliers is called resistant . We say that the median is a resistant measure of center, and the mean is not resistant. In a sense, the median is able to resist the pull of a far away value, but the mean is drawn to such values. It cannot resist the influence of outlier values. As a result, when we have a data set that contains an outlier, it is often better to use the median to describe the center, rather than the mean.
Example D
In 2005, the CEO of Yahoo, Terry Semel, was paid almost $231,000,000 (see http://www.forbes.com/static/execpay2005/rank.html ). This is certainly not typical of what the average worker at Yahoo could expect to make. Instead of using the mean salary to describe how Yahoo pays its employees, it would be more appropriate to use the median salary of all the employees.
You will often see medians used to describe the typical value of houses in a given area, as the presence of a very few extremely large and expensive homes could make the mean appear misleadingly large.
On the Web
http://edhelper.com/statistics.htm
http://en.wikipedia.org/wiki/Arithmetic_mean
Java Applets helpful to understand the relationship between the mean and the median:
http://www.ruf.rice.edu/~lane/stat_sim/descriptive/index.html
http://www.shodor.org/interactivate/activities/PlopIt/
Vocabulary
When examining a set of data, we use descriptive statistics to provide information about where the data are centered:
The mode is a measure of the most frequently occurring number in a data set and is most useful for categorical data and data measured at the nominal level.
The mean and median are two of the most commonly used measures of center.
The mean , or average, is the sum of the data points divided by the total number of data points in the set. In a data set that is a sample from a population, the sample mean is denoted by \begin{align*}\overline{x}\end{align*} . The population mean is denoted by \begin{align*}\mu\end{align*} .
The median is the numeric middle of a data set. If there are an odd number of data points, this middle value is easy to find. If there is an even number of data values, the median is the mean of the middle two values.
An outlier is a number that has an extreme value when compared with most of the data. The median is resistant. That is, it is not affected by the presence of outliers. The mean is not resistant , and therefore, the median tends to be a more appropriate measure of center to use in examples that contain outliers. Because the mean is the numerical balancing point for the data, it is an extremely important measure of center that is the basis for many other calculations and processes necessary for making useful conclusions about a set of data.
Guided Practice
The mean of 6 people in a room is 35 years. A 40 year old person comes in. What is now the mean age of the people in the room?
Solution:
We will start by using the definition of the mean:
\begin{align*}\overline{x}=\frac{\Sigma x}{n}.\end{align*}
Since we know the mean is 35, and that \begin{align*}n=6\end{align*} , so we can substitute these into the equation:
\begin{align*}35=\frac{\Sigma x}{6} \Rightarrow \Sigma x=6 \cdot 35=210. \end{align*}
When a new person of age 40 enters the room the total becomes 210 + 40 = 250. We find the average by dividing by 7. The average age is now 35.7 years.
Practice

In Lois’
\begin{align*}2^{\text{nd}}\end{align*}
grade class, all of the students are between 45 and 52 inches tall, except one boy, Lucas, who is 62 inches tall. Which of the following statements is true about the heights of all of the students?
 The mean height and the median height are about the same.
 The mean height is greater than the median height.
 The mean height is less than the median height.
 More information is needed to answer this question.
 None of the above is true.
 Enrique has a 91, 87, and 95 for his statistics grades for the first three quarters. His mean grade for the year must be a 93 in order for him to be exempt from taking the final exam. Assuming grades are rounded following valid mathematical procedures, what is the lowest whole number grade he can get for the \begin{align*}4^{\text{th}}\end{align*} quarter and still be exempt from taking the exam?

How many data points should be removed from each end of a sample of 300 values in order to calculate a 10% trimmed mean?
 5
 10
 15
 20
 30
 In the last example, after removing the correct numbers and summing those remaining, what would you divide by to calculate the mean?
 The chart below shows the data from the Galapagos tortoise preservation program with just the number of individual tortoises that were bred in captivity and reintroduced into their native habitat.
Island or Volcano  Number of Individuals Repatriated 

Wolf  40 
Darwin  0 
Alcedo  0 
Sierra Negra  286 
Cerro Azul  357 
Santa Cruz  210 
Española  1293 
San Cristóbal  55 
Santiago  498 
Pinzón  552 
Pinta  0 
Figure: Approximate Distribution of Giant Galapagos Tortoises in 2004 (“Estado Actual De Las Poblaciones de Tortugas Terrestres Gigantes en las Islas Galápagos,” Marquez, Wiedenfeld, Snell, Fritts, MacFarland, Tapia, y Nanjoa, Scologia Aplicada, Vol. 3, Num. 1,2, pp. 9811).
For this data, calculate each of the following:
(a) mode
(b) median
(c) mean
(d) a 10% trimmed mean
(e) midrange
(f) upper and lower quartiles
(g) the percentile for the number of Santiago tortoises reintroduced
 In the previous question, why is the answer to (c) significantly higher than the answer to (b)?
 The mean of 10 scores is 12.6. What is the sum of the scores?
 While on vacation John drove an average of 262 miles per day for a period of 12 days. How far did John drive in total while he was on vacation?
 Find x if 5, 9, 11, 12, 13, 14, 15 and x have a mean of 13.
 Find a given that 3, 0, a, a, 4, a, 6, a, and 3 have a mean of 4.
 A sample of 10 measurements has a mean of 15.6 and a sample of 20 measurements has a mean of 13.2. Find the mean of all 30 measurements.

The table below shows the results when 3 coins were tossed simultaneously 30 times. The number of tails appearing was recorded. Calculate the:
 Mode
 Median
 Mean
Number of Tails  Number of times occurred 

3  4 
2  12 
1  11 
0  3 
Total  30 

Compute the mean, the median and the mode for each of the following sets of numbers:
 3, 16, 3, 9, 5, 7, 11
 5, 3, 3, 7, 5, 5, 16, 9, 3, 18, 11, 5, 3, 7
 7, 4, 0, 12, 8, 121, 3

Find the mean and the median for each of the list of values:
 65, 69, 73, 77, 81, 87
 11, 7, 3, 8, 101
 31, 11, 41, 31

Find the mean and median for each of the following datasets:
 65, 66, 71, 75, 81, 85
 11, 7, 1, 7, 99
 31, 11, 41, 31
 Explain why there is such a large difference between the median and the mean in the dataset of part b in the previous question
 How do you determine which measure of center best describes a particular data set?
Technology Notes:
Calculating the Mean on the TI83/84 Graphing Calculator
Step 1: Entering the data
On the home screen, press [2ND][{] , and then enter the following data separated by commas. When you have entered all the data, press [2ND][}][STO][2ND][L1][ENTER] . You will see the screen on the left below:
1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6
Step 2: Computing the mean
On the home screen, press [2ND][LIST] to enter the LIST menu, press the right arrow twice to go to the MATH menu (the middle screen above), and either arrow down and press [ENTER] or press [3] for the mean. Finally, press [2ND][L1][)] to insert L1 and press [ENTER] (see the screen on the right above).
Calculating Weighted Means on the TI83/84 Graphing Calculator
Use the data of the number of children in a family. In list L1 , enter the number of children, and in list L2 , enter the frequencies, or weights.
The data should be entered as shown in the left screen below:
Press [2ND][STAT] to enter the LIST menu, press the right arrow twice to go to the MATH menu (the middle screen above), and either arrow down and press [ENTER] or press [3] for the mean. Finally, press [2ND][L1][,][2ND][L2][)][ENTER] , and you will see the screen on the right above. Note that the mean is 2.5, as before.