<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# Box-and-Whisker Plots

## Plotting the five-number summary for ascending data.

Estimated8 minsto complete
%
Progress
Practice Box-and-Whisker Plots

MEMORY METER
This indicates how strong in your memory this concept is
Progress
Estimated8 minsto complete
%
Interpreting Box-and-Whisker Plots

If you were asked to evaluate a box plot to find the median, quartiles, extremes and outliers, would you know how? What does it mean if the ‘box’ in a box plot is unusually long or short? Does a long ‘whisker’ on one or both sides mean something important?

### Interpreting Box-and-Whisker Plots

Box-and-whisker plots (or “box plots”) are commonly used to compare a single value or range of values for easier, more effective decision-making. Box and whisker plots are very effective and easy to read, and can summarize data from multiple sources and display the results in a single graph.

Use box and whisker plots when you have multiple data sets from independent sources that are related to each other in some way. Examples include comparing test scores between schools or classrooms, and exploring data from before and after a process change.

Remember that the line inside the box represents the middle value when the data points are arranged numerically. Because the median is only identified by location in a series, it can sometimes be very indicative of the trend or average of the data set as a whole, and sometimes is not useful for that purpose at all (see the first example).

Recall that skewed data appears as a longer “tail” in one direction on a histogram, it is similar on a box plot. If the box in a box plot is stretched in one direction or the other, then the data is skewed in that direction. Data skewed right indicates a closer concentration of values on the left, since the plot indicates values more “strung out” on the right side.

A longer box indicates a greater interquartile range since the sides of the box indicate the 1st and 3rd quartiles. A greater interquartile range is an indicator of data that may be somewhat unreliable. Since the interquartile range represents the 50% of the data closest to the median, a greater range in this section of the plot suggests that the median may not be a great indicator of central tendency.

A plot with long whiskers represents a greater range for the overall sample than simply a longer box itself does. Data covering a greater range is naturally less reliable as an indicator of highly probable values, but given the option, longer whiskers are less of a concern than a long box. A broad range of possibilities but a strong likelihood of central values is more reliable to use for prediction than a moderate overall range with little concentration at the median.

#### Identifying the Five Number Summary

Identify the 5 number summary and any outliers depicted in the box plot below:

The 5 number summary is depicted by the vertical bars in the box and by the endpoints of the ‘whiskers’:

• Minimum: 13
• 1st Quartile: 16
• Median: 19
• 3rd Quartile: 22
• Maximum: 24
• Outliers (depicted by open circles disconnected from the box and whiskers): 4 and 30

#### Understanding the Shape of Box Plots

What is indicated by the shape of the box plot below?

The box in the plot extends nearly to the lower extreme, indicating that the data less than the median is likely at least relatively consistent, since there is not a large jump between the lower 25% and the minimum. The longer whisker on the upper side suggests that there may be larger variance among the greater values, since there is a greater distance from the 3rd quartile to the upper extreme than from the median to the 3rd quartile.

#### Interpreting Box Plots

A percentile box plot compares a particular value or range of values to an averaged reference point. The values on the scale represent the percentage of scores less than the plotted value. For instance, a score of 55% indicates that 55% of other values were less than the indicated score, and 45% were greater.

Maria recently completed a standardized test, and the box plot below describes her results. The median is her actual calculated percentile, and the rest of the 5 number summary suggests the range of percentiles that her score is expected to lie within once all scores are tabulated. Based on the information in the graph, would you expect Maria to be proud of her score? Why or why not?

Maria’s score is expected to lie between the 62nd and 92nd percentile, with the most likely comparison being the 76th percentile. Since the 76th percentile indicates that her score was higher than that of 76% of all the students who took the test, and only 24% achieved a higher score than hers, yes, I would certainly say she has reason to be proud!

#### Earlier Problem Revisited

If you were asked to evaluate a box plot to find the median, quartiles, extremes and outliers, would you know how? What does it mean if the ‘box’ in a box plot is unusually long or short? Does a long ‘whisker’ on one or both sides mean something important?

With the practice you have had now, these questions should be easy!

• Median: the center vertical line in the ‘box’
• 1st and 3rd Quartiles: the leftmost and rightmost vertical lines of the ‘box’
• Lower and Upper Extremes: the endpoints of the ‘whiskers’

### Examples

#### Example 1

Make a Box and Whisker plot from the following data sets.

For all three sets, first organize the data by increasing numerical order and identify the five-number summary (FNS). Once you have the FNS, create the box plot for each just as in the examples above. The three plots should resemble the images below:

a.  Initial weight (December) of 14 women in a weight loss study (pounds) 190, 175, 187, 199, 205, 187, 176, 180, 187, 191, 200, 193, 188, 196

b. Weights of the same women one month later (January) 187, 174, 181, 189, 196, 178, 174, 176, 181, 186, 188, 191, 183, 191

c. Weights of the same women in February. 181, 165, 176, 182, 190, 176, 171, 170, 171, 185, 187, 181, 179, 186

#### Example 2

How do the data in a and c compare?

If we compare the data between a and c, we can see the overall weights of the women in the study did indeed go down. In fact, the minimum value at the start of the study was greater than the maximum two months later.

#### Example 3

How did the median change?

The median in December was 189, and in February it was 180.

#### Example 4

How did the maximum weight change?

The maximum in December was 205, and went down to 190 by February.

#### Example 5

How did the minimum weight change?

The minimum weight in December was 187, and it also went down, to 171 by February.

#### Example 6

How did the range change?

The range increased notably, from a mere 9 pounds in December, to more than 1.5 times that, 14, in January.

#### Example 7

How would you judge the effectiveness of the weight loss method used in the study?

It would appear that the method was effective, at least in the short term. The increased range would indicate that it was somewhat more effective for some participants than others.

### Review

1. What is the five number summary of the following box and whisker plot?

2. The box plot shows the heights in inches of boys on a High School Baseball Team. What is the 5 number summary of the plot?

3. Listed are the heights in inches of girls on a High School Ski Team. Make a plot of the girls’ heights. 58, 59, 59, 60, 62, 65, 68, 69, 70, 70, 71

4. Comparing the heights between the two teams, which has the taller players on average? How do you know?

Use the box and whisker plot below to examine scores received on an English GED Test to answer questions 5-9

5. What was the high score on the test?

6. What percent of the class scored above a 72?

7. What was the median score on the test?

8. What percent of the class scored between 88 and 96?

9. Would you expect the mean to be above or below the median? Explain

Use the graph below that shows how much girls spent on average per month on clothes during August.

10. How many girls shop for clothes?

11. What percent of girls spent less than \$85.00 in August on clothes?

12. Would you expect the mean number of dollars spent to be higher or lower than the median? Explain

Use the graphs below to compare the amount of time a teenager spends in the bathroom getting ready for school and the amount of time they spend in the bathroom getting ready to go to a party.

TIME SPENT GETTING READY FOR SCHOOL:

TIME SPENT GETTING READY FOR A PARTY:

13. What percent of teenagers spend at least 15 minutes getting ready for a party?

14. What is the 3rd Quartile for the time spent getting ready for a party?

15. Is it more common for a teenager to spend more than 1 hour getting ready for school or between 1 and 2 hrs getting ready for a party? Explain

Answer True or False for questions 16-24.

16. ______ Some teenagers do not spend time getting ready for parties.

17. ______ The graph of time spent getting ready for a party contains more data than the getting ready for school graph.

18. ______ 25% of teenagers spend between 48 and 60 minutes getting ready for school.

19. ______ 15% of the teenagers did not go to parties that month

20. ______ In general teenagers spend more time getting ready for a party than getting ready for school.

21. ______ The Party data is more varied than the homework data

22. ______ The ratio of teenagers who spend more than 110 minutes getting ready for a party to those who spend less is about 2:1

23. ______ 225 Teenagers watch TV.

24. ______ Twice as many teenagers spend more than 1 hour on getting ready for school, than they do spending an hour getting ready for a party.

### Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes

### Vocabulary Language: English

TermDefinition
Extremes The extremes are the maximum and minimum values in a data set.
five point summary The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.
line of fit A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).
Median The median of a data set is the middle value of an organized data set.
observed data Observed data are the values that result from computations performed on the input variable.
Outlier In statistics, an outlier is a data value that is far from other data values.
Quartile A quartile is each of four equal groups that a data set can be divided into.
skewed As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.
statistical correlation Statistical correlation is a representation of possible related changes in values between the two sets of data.
trends Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpoint
uniform A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.