The three measures of central tendency are mean, median, and mode. When would it make sense to use one of these measures and not the others?
Watch This
http://www.youtube.com/watch?v=h8EYEJ32oQ8 Khan Academy: Statistics Intro: Mean, Median, and Mode
Guidance
With descriptive statistics , your goal is to describe the data that you find in a sample or is given in a problem. Because it would not make sense to present your findings as long lists of numbers, you summarize important aspects of the data. One important aspect of the data is the measure of central tendency , which is a measure of the “middle” value of a set of data. There are three ways to measure central tendency:
- Use the mean , which is the arithmetic average of the data.
- Use the median , which is the number exactly in the middle of the data. When the data has an odd number of counts, the median is the middle number after the data has been ordered. When the data has an even number of counts, the median is the arithmetic average of the two most central numbers.
- Use the mode , which is the most often occurring number in the data. If there are two or more numbers that occur equally frequently, then the data is said to be bimodal or multimodal.
Calculating the mean, median and mode is straightforward. What is challenging is determining when to use each measure and knowing how to interpret the data using the relationships between the three measures.
Example A
Five people were called on a phone survey to respond to some political opinion questions. Two people were from the zip code 94061, one person was from the zip code 94305 and two people were from 94062.
Which measure of central tendency makes the most sense to use if you want to state where the average person was from?
Solution: None of the measures of central tendency make sense to apply to this situation. Zip codes are categorical data rather than quantitative data even though they happen to be numbers. Other examples of categorical data are hair color or gender. You could argue that mode is applicable in a broad sense, but in general remember that mean, median, and mode can only be applied to quantitative data.
Example B
Compute the mean, median and mode for the following numbers.
3, 5, 1, 6, 8, 4, 5, 2, 7, 8, 4, 2, 1, 3, 4, 6, 7, 9, 4, 3, 2
Solution:
Mean: The sum of all these numbers is 94 and there are 21 numbers total so the mean is \frac{94}{21} \approx 4.4762 .
Median: When you order the numbers from least to greatest you get:
1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9
The @$11^{th}@$ number has ten numbers to the right and ten numbers to the left so it is the median. The median is the number 4.
Mode: the most frequently occurring number is the number 4.
Note: it is common practice to round to 4 decimals in AP Statistics.
Example C
You write a computer code to produce a random number between 0 and 10 with equal probability. Unfortunately, you suspect your code doesn’t work perfectly because in your first few attempts at running the code, it produces the following numbers:
1, 9, 1, 1, 9, 2, 9, 1, 9, 9, 9, 2, 2
How would you argue using mean, median, or mode that this code is probably not producing a random number between 0 and 10 with equal probability?
Solution: This question is very similar to questions you will see when you study statistical inference.
First you would note that the mean of the data is 4.9231. If the data was truly random then the mean would probably be right around the number 5 which it is. This is not strong evidence to suggest that the random number generating code is broken.
Next you would note that the median of the data is 2. This should make you suspect that something is wrong. You would expect that the median is of random numbers between 0 and 10 to be somewhere around 5.
Lastly, you would note that the mode of the data is 9. By itself this is not strong data to suggest anything. Every sample will have to have at least one mode. What should make you suspicious, however, is the fact that only two other numbers were produced and were almost as frequent as the number 9. You would expect a greater variety of numbers to be produced.
Concept Problem Revisited
In order to decide which measure of central tendency to use, it is a good idea to calculate and interpret all three of the numbers.
For example, if someone asked you how many people can sit in the typical car, it would make more sense to use mode than to use mean. With mode, you could find out that a five person car is the most frequent car driven and determine that the answer to the question is 5. If you calculate the mean for the number of seats in all cars, you will end up with a decimal like 5.4, which makes less sense in this context.
On the other hand, if you were finding the central heights of NBA players, using mean might make a lot more sense than mode.
Vocabulary
The mean is the arithmetic average of the data.
The median is the number in the middle of a data set. When the data has an odd number of counts, the median is the middle number after the data has been ordered. When the data has an even number of counts, the median is the average of the two most central numbers.
The mode is the most often occurring number in the data. If there are two or more numbers which occur equally frequently, then the data is said to be bimodal or multimodal .
With descriptive statistics , your goal is to describe the data that you find in a sample or is given in a problem.
With inference statistics , your goal is use the data in a sample to draw conclusions about a larger population.
Guided Practice
1. Ross is with his friends and they want to play basketball. They decide to choose teams based on the number of cousins everyone has. One team will be the team with fewer cousins and the other team will be the team with more cousins. Should they use the mean, median or mode to compute the cutoff number that will separate the two teams?
2. Compute the mean, median, and mode for the following numbers.
1, 4, 5, 7, 6, 8, 0, 3, 2, 2, 3, 4, 6, 5, 7, 8, 9, 0, 6, 5, 3, 1, 2, 4, 5, 6, 7, 8, 8, 8, 4, 3, 2
3. The cost of fresh blueberries at different times of the year are:
$2.50, $2.99, $3.20, $3.99, $4.99
If you bought blueberries regularly what would you typically pay?
Answers:
1. Ross and his friends should use the median number of cousins as the cutoff number because this will allow each team to have the same number of players. If there are an odd number of people playing, then the extra person will just join either team or switch in later.
2. The mean is 4.6061. The median is 5. The mode is 8.
3. The word “typically” is used instead of average to allow you to make your own choice as to whether mean, median, or mode would make the most sense. In this case, mean does make the most sense. The average cost is $3.53.
Practice
You surveyed the students in your English class to find out how many siblings each student had. Here are your results:
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 10, 12
1. Find the mean, median, and mode of this data.
2. Why does it make sense that the mean number of siblings is greater than the median number of siblings?
3. Which measure of central tendency do you think is best for describing the typical number of siblings?
4. So far in math you have taken 10 quizzes this semester. The mean of the scores is 88.5. What is the sum of the scores?
5. Find @$x@$ if 5, 9, 11, 12, 13, 14, 16, and @$x@$ have a mean of 12.
6. Meera drove an average of 22 miles a day last week. How many miles did she drive last week?
7. Find @$x@$ if 2, 6, 9, 8, 4, 5, 8, 1, 4, and @$x@$ have a median of 5.
Calculate the mean, median, and mode for each set of numbers:
8. 11, 15, 19, 12, 21, 34, 15, 28, 24, 15, 27, 19, 20, 13, 15
9. 3, 5, 7, 5, 5, 17, 8, 9, 11, 5, 3, 7
10. -3, 0, 5, 8, 12, 4, 2, 1, 6
Calculate the mean and median for each set of numbers:
11. 12, 88, 89, 90
12. 16, 17, 19, 20, 20, 98
13. For which of the previous two questions was the median less than the mean? What in the set of numbers caused this?
14. For which of the previous two questions was the median greater than the mean? What in the set of numbers caused this?
15. In each of the sets of numbers for problems 11 and 12, there is one number that could be considered an outlier . Which numbers do you think are the outliers and why? What would happen to the mean and median if you removed the outliers?