10.3: Box-and-Whisker Plots and Outliers
Introduction
Shot Put Measurement
A track and field coach, Mr.Watson was measuring shot put distances for his varsity and junior varsity teams. Here is his data, in feet, that he put in order from least to greatest.
Varsity: 36.8, 43.5, 45.8, 46.2, 49.1, 50.7, 52.7, 54.3, 54.4, 55.8, 56.0, 58.5
Junior Varsity: 33.2, 35.4, 36.2, 37.0, 37.6, 39.4, 40.6, 40.8, 41.3, 42.1, 44.5, 50.3
Mr. Watson wants to present this information to both of his teams. He wants to compare them. How do they compare? How can Mr. Watson create a display that will communicate what he wants to tell his team?
To accomplish this task, you will need to know about box-and-whisker plots. Pay close attention and you will be able to help Mr. Watson at the end of the lesson.
What You Will Learn
In this lesson, you will learn the following skills.
- Draw a box-and-whisker plot to represent given data.
- Use a box-and-whisker plot to identify the median, quartiles, inter quartile range, extremes and any outliers of a set of data.
- Compare box-and-whisker plots before and after removal of outliers.
- Make, compare and interpret double box-and-whisker plots of real-world data.
Teaching Time
I. Draw a Box-and-Whisker Plot to Represent Given Data
At times it is useful to get a general idea of how data cluster together. Box-and-whisker plots display the distribution of data items along a number line. The data are divided into four equal parts, separated by points called quartiles. You can also see the smallest data point, the extreme minimum, and the largest data point, the extreme maximum.
A box-and-whisker plot is created by determining five points. First we’ll place the data in order from smallest to largest. Then, we create a number line that shows the range of the data using equal intervals. We’ll use the median as our middle point on the box-and-whisker plot and to split the data in half. The median of each half, the quartile, is then calculated. These separate the data into quarters. Finally, we’ll use the highest datum and the lowest datum as our endpoints or our extremes. Boxes are drawn between the quartiles and whiskers are drawn to the extremes.
Example
Draw a box-and-whisker plot for the given data.
16, 51, 32, 16, 24, 37, 7, 22, 19, 40, 10, 31, 29, 38, 21, 11
Step 1: Put the data in order from smallest to largest.
7, 10, 11, 16, 16, 19, 21, 22, 24, 29, 31, 32, 37, 38, 40, 51
Step 2: Draw a number line that includes your extremes, 7 and 51. In this case, we will use a number line from 5 to 55 using intervals of 5.
Step 3: Determine the median of the data. The middle points in the data are 22 and 24 so the median is 23. Mark the median with a point beneath the number line.
Step 4: The median separates the data into two groups as shown below:
\begin{align*}7, 10, 11, 16, 16, 19, 21, 22 \qquad 24, 29, 31, 32, 37, 38, 40, 51\end{align*}
Find the median in each of these groups. These are the quartiles which are 16 and 34.5. These divide the data into four groups. Mark the quartiles as you did the median, with a point.
Step 5: Draw boxes between the quartiles and the median.
Step 6: Mark the extremes, the smallest and largest numbers, with points. In this case, the extremes are 7 and 51.
Step 7: Draw whiskers, or horizontal lines, to connect the quartiles to the extremes.
You can see from the box-and-whisker plot that half of the data will be found between the first quartile and the third quartile. A quarter of the data is between the minimum and the first quartile and the last quarter is between the third quartile and the maximum. The median, of course, marks the half-way point between the data.
In this particular example, we can see that the second half of the data is stretched out over a further area than the first half and about half is between 15 and 35.
II. Use a Box-and-Whisker Plot to Identify the Median, Quartiles, Interquartile Range, Extremes and any Outliers of a Set of Data
In order to construct a box-and-whisker plot, you must calculate several statistical measures. However, a box-and-whisker plot that is already constructed can quickly supply statistical measures by looking at the five points.
The first and last points give you the extremes of the data. The third or middle point gives you the median. And the second and fourth points, between the median and the extremes, give you the quartiles.
The interquartile range is the range between the first quartile and the third quartile. This shows you where the middle half of the data is. It can be calculated by subtracting the first quartile from the third quartile. Finally, the outliers, data items that are far away from the general trend, can be located as extremes that cause the whiskers to be exceptionally long. Data does not always have outliers. For example, if no single point is exceptionally far from other points, no outlier exists.
Example
Use the given box-and-whisker plot to identify the a) extremes, b) the median, c) the quartiles, d) the interquartile range, and e) the outliers (if any).
a) The extremes in this data set are approximately 35 and 129.
b) The median is approximately 95.
c) The first quartile is approximately 82 and the third quartile approximately 104.
d) The interquartile range, then, is 104 – 82 or 22.
e) Finally, the extreme minimum, 35, appears to be an outlier as the left whisker is very long compared to the rest of the plot.
III. Compare Box-and-Whisker Plots Before and After Removal of Outliers
As you know, outliers are points that are unusually large or small compared to the rest of the data. When we discuss measures of central tendency like mean, median, and mode, we must also remember that in the real world there are many exceptions. Sometimes when we consider data, we might choose to remove the outliers in order to draw better conclusions based on the data. Let’s look at an example.
Example
Shanda runs on her school’s track team. They recently ran a 100 meter dash at a track meet and recorded official times. These are the results in seconds: 11.7, 10.8, 11.1, 10.9, 11.7, 11.6, 12.0, 19.6, 12.2, 11.6, 11.5, 11.6, 11.0, 12.0, 11.6, 11.5, 11.7, 11.3, 12.3, 10.1.
Shanda’s time was 11.1 and she wants to know how she compares to the rest of her team. She will use a box-and-whisker plot to help figure this out. Here are the steps to this process.
Step 1: She places the data in order.
10.1, 10.8, 10.9, 11.0, 11.1, 11.3, 11.5, 11.5, 11.6, 11.6, 11.6, 11.6, 11.7, 11.7, 11.7, 12.0, 12.0, 12.2, 12.3, 19.6
Step 2: She draws a number that includes the extremes.
Step 3: She finds the median, 11.6, and places a point on the number line.
Step 4: She finds the first and third quartiles, 11.2 and 11.85.
Step 5: She draws boxes between the quartiles and the median.
Step 6: She places the extremes, 10.1 and 19.6, on the numbers with points.
Step 7: She draws whiskers from the quartiles to the extremes.
When Shanda analyzes the box-and-whisker plot, she finds that her time, 11.1 seconds, is barely less than the first quartile. She knows that her friend, Teresa, is super fast. She has already been offered track scholarships from major universities. Shanda doesn’t think she can realistically catch up to Teresa. Another teammate, Lisa, had fallen during the race but got up and continued to the finish line. Shanda believes that neither Teresa nor Lisa’s scores are useful in gauging her speed. She decides to look at the same data but remove those two outliers.
Here’s her new data:
10.8, 10.9, 11.0, 11.1, 11.3, 11.5, 11.5, 11.6, 11.6, 11.6, 11.6, 11.7, 11.7, 11.7, 12.0, 12.0, 12.2, 12.3
She recalculates her statistical measures and creates a new box-and-whisker plot:
Extremes: 10.8 and 12.3
Median: 11.6
First and third quartiles: 11.3 and 11.7
When the two outliers are removed, Shanda can see that most of the data is grouped closely together. Her time, 11.1, is still in the first quartile. However, her competition is tight because the rest of the team isn’t far behind. She is proud of her time and motivated to keep ahead of the crowd.
IV. Make, Compare and Interpret Double Box-and-Whisker Plots of Real-World Data
As we have seen with stem-and-leaf plots and histograms, we can make double plots or graphs when there are two factors that we are comparing. A double box-and-whisker plot can be made by drawing the second factor beneath the first factor. This will allow us to look at both factors on the same plot. We can use what we have learned to accomplish this with the problem from the introduction. Let’s go take a look.
Real-Life Example Completed
Shot Put Measurement
Here is the problem from the introduction. Reread it and then create a data display using what you have learned about box-and-whisker plots. Finally analyze the data.
A track and field coach, Mr.Watson was measuring shot put distances for his varsity and junior varsity teams. Here is his data, in feet, that he put in order from least to greatest.
Varsity: 36.8, 43.5, 45.8, 46.2, 49.1, 50.7, 52.7, 54.3, 54.4, 55.8, 56.0, 58.5
Junior Varsity: 33.2, 35.4, 36.2, 37.0, 37.6, 39.4, 40.6, 40.8, 41.3, 42.1, 44.5, 50.3
Mr. Watson wants to present this information to both of his teams. He wants to compare them. How do they compare? How can Mr. Watson create a display that will communicate what he wants to tell his team?
Remember, there are several parts to your answer.
Solution to Real – Life Example
Make a double box-and-whisker plot of this data. How does the data compare?
Varsity | Junior Varsity | |
---|---|---|
Extremes: | 36.8 and 58.5 | 33.2 and 50.3 |
Median: | 51.7 | 40.0 |
First and third quartiles: | 46.0 and 55.1 | 36.6 and 41.7 |
From this box-and-whisker plot, the coach can tell that the teams’ results are what he expected—the varsity is generally better than the junior varsity. There are a number of players whose results overlap—the highest junior varsity player is better than the entire first quartile of the varsity team. Perhaps some adjustments need to be made. However, the coach must also consider their results in other events before switches are made. The lowest varsity player, for example, is also the best long distance runner. It is also apparent that the results are more dispersed, or spread out, in the varsity team than in the junior varsity team.
Vocabulary
Here are the vocabulary words that are found in this lesson.
- Box-and-Whisker Plots
- a visual display of data on a number line.
- Quartiles
- when data is divided into four even sections.
- Median
- the middle value in a data set.
- Extremes
- the first and last points in a data set
- Interquartile Range
- the range between the first and third quartiles
- Outliers
- Data values that are far away from the general trend of the data.
Time to Practice
Directions: Use each data for each set of instructions.
90, 104, 98, 156, 140, 85, 122, 129, 142, 138, 131, 81, 151, 147, 130, 156
- Create a box-and-whisker plot for the data.
- Identify the extremes.
- Identify the median.
- Identify the quartiles.
Directions: Define the following terms.
- box-whisker-plot
- quartiles
- Median
- Extremes
- Interquartile Range
- Outliers
Directions: Use the box-and-whisker plot to answer the following questions.
- What is the median value?
- Identify the quartiles
- Identify the interquartile range.
- Identify any extremes
- Identify any outliers.
Directions: Use the data set:
316, 385, 338, 410, 390, 328, 335, 406, 355, 310, 332, 374, 359, 640, 417, 382, 317
- Draw a box-and-whisker plot for the data set.
- Identify the extremes
- median
- quartiles
- the outlier(s)
- Remove the outlier(s) and make another box-and-whisker plot.
- Identify the extreme, median, and quartiles.
- How do the box-and-whisker plots compare?