<meta http-equiv="refresh" content="1; url=/nojavascript/">
You are viewing an older version of this Concept. Go to the latest version.

# Box-and-Whisker Plots

## Plotting the five-number summary for ascending data.

%
Progress
Practice Box-and-Whisker Plots
Progress
%
Interpreting Box-and-Whisker Plots

#### Objective

Here you will learn to efficiently pull information from box plots.

#### Concept

If you were asked to evaluate a box plot to find the median, quartiles, extremes and outliers, would you know how? What does it mean if the ‘box’ in a box plot is unusually long or short? Does a long ‘whisker’ on one or both sides mean something important?

#### Guidance

Box-and-whisker plots (or “box plots”) are commonly used to compare a single value or range of values for easier, more effective decision-making. Box and whisker plots are very effective and easy to read, and can summarize data from multiple sources and display the results in a single graph.

Use box and whisker plots when you have multiple data sets from independent sources that are related to each other in some way. Examples include comparing test scores between schools or classrooms, and exploring data from before and after a process change.

Remember that the line inside the box represents the middle value when the data points are arranged numerically. Because the median is only identified by location in a series, it can sometimes be very indicative of the trend or average of the data set as a whole, and sometimes is not useful for that purpose at all (see Example A).

Recall that skewed data appears as a longer “tail” in one direction on a histogram, it is similar on a box plot. If the box in a box plot is stretched in one direction or the other, then the data is skewed in that direction. Data skewed right indicates a closer concentration of values on the left, since the plot indicates values more “strung out” on the right side.

A longer box indicates a greater interquartile range since the sides of the box indicate the 1 st  and 3 rd  quartiles. A greater interquartile range is an indicator of data that may be somewhat unreliable. Since the interquartile range represents the 50% of the data closest to the median, a greater range in this section of the plot suggests that the median may not be a great indicator of central tendency.

A plot with long whiskers represents a greater range for the overall sample than simply a longer box itself does. Data covering a greater range is naturally less reliable as an indicator of highly probable values, but given the option, longer whiskers are less of a concern than a long box. A broad range of possibilities but a strong likelihood of central values is more reliable to use for prediction than a moderate overall range with little concentration at the median.

Example A

Identify the 5 number summary and any outliers depicted in the box plot below:

Solution:  The 5 number summary is depicted by the vertical bars in the box and by the endpoints of the ‘whiskers’:

• Minimum: 13
• 1 st Quartile: 16
• Median: 19
• 3 rd Quartile: 22
• Maximum: 24
• Outliers (depicted by open circles disconnected from the box and whiskers): 4 and 30

Example B

What is indicated by the shape of the box plot below?

Solution: The box in the plot extends nearly to the lower extreme, indicating that the data less than the median is likely at least relatively consistent, since there is not a large jump between the lower 25% and the minimum. The longer whisker on the upper side suggests that there may be larger variance among the greater values, since there is a greater distance from the 3 rd quartile to the upper extreme than from the median to the 3 rd quartile.

Example C

A percentile box plot compares a particular value or range of values to an averaged reference point. The values on the scale represent the percentage of scores less than the plotted value. For instance, a score of 55% indicates that 55% of other values were less than the indicated score, and 45% were greater.

Maria recently completed a standardized test, and the box plot below describes her results. The median is her actual calculated percentile, and the rest of the 5 number summary suggests the range of percentiles that her score is expected to lie within once all scores are tabulated. Based on the information in the graph, would you expect Maria to be proud of her score? Why or why not?

Solution: Maria’s score is expected to lie between the 62 nd  and 92 nd  percentile, with the most likely comparison being the 76 th  percentile. Since the 76 th  percentile indicates that her score was higher than that of 76% of all the students who took the test, and only 24% achieved a higher score than hers, yes, I would certainly say she has reason to be proud!

Concept Problem Revisited

If you were asked to evaluate a box plot to find the median, quartiles, extremes and outliers, would you know how? What does it mean if the ‘box’ in a box plot is unusually long or short? Does a long ‘whisker’ on one or both sides mean something important?

With the practice you have had now, these questions should be easy!

• Median: the center vertical line in the ‘box’
• 1 st and 3 rd  Quartiles: the leftmost and rightmost vertical lines of the ‘box’
• Lower and Upper Extremes: the endpoints of the ‘whiskers’

#### Vocabulary

The interquartile range is calculated by subtracting the 1 st  quartile from the 3 rd  quartile and represents the middle 50% of the sample.

#### Guided Practice

1. Make a Box and Whisker plot from the following data sets.
1. Initial weight (December) of 14 women in a weight loss study (pounds) 190, 175, 187, 199, 205, 187, 176, 180, 187, 191, 200, 193, 188, 196
2. Weights of the same women one month later (January) 187, 174, 181, 189, 196, 178, 174, 176, 181, 186, 188, 191, 183, 191
3. Weights of the same women in February. 181, 165, 176, 182, 190, 176, 171, 170, 171, 185, 187, 181, 179, 186
2. How do the data in a and c compare?
3. How did the median change?
4. How did the maximum weight change?
5. How did the minimum weight change?
6. How did the range change?
7. How would you judge the effectiveness of the weight loss method used in the study?

Solutions:

1. For all three sets, first organize the data by increasing numerical order and identify the five-number summary (FNS). Once you have the FNS, create the box plot for each just as in the examples above. The three plots should resemble the images below:
2. If we compare the data between a and c, we can see that the overall weights of the women in the study did indeed go down. In fact, the minimum value at the start of the study was greater than the maximum two months later.
3. The median in December was 189, and in February it was 180.
4. The maximum in December was 205, and went down to 190 by February.
5. The minimum weight in December was 187, and it also went down, to 171 by February.
6. The range increased notably, from a mere 9 pounds in December, to more than 1.5 times that, 14, in January.
7. It would appear that the method was effective, at least in the short term. The increased range would indicate that it was somewhat more effective for some participants than others.

#### Practice

1. What is the five number summary of the following box and whisker plot?

2. The box plot shows the heights in inches of boys on a High School Baseball Team. What is the 5 number summary of the plot?

3. Listed are the heights in inches of girls on a High School Ski Team. Make a plot of the girls’ heights. 58, 59, 59, 60, 62, 65, 68, 69, 70, 70, 71

4. Comparing the heights between the two teams, which has the taller players on average? How do you know?

Use the box and whisker plot below to examine scores received on an English GED Test to answer questions 5-9

5. What was the high score on the test?

6. What percent of the class scored above a 72?

7. What was the median score on the test?

8. What percent of the class scored between 88 and 96?

9. Would you expect the mean to be above or below the median? Explain

Use the graph below that shows how much girls spent on average per month on clothes during August.

10. How many girls shop for clothes?

11. What percent of girls spent less than \$85.00 in August on clothes?

12. Would you expect the mean number of dollars spent to be higher or lower than the median? Explain

Use the graphs below to compare the amount of time a teenager spends in the bathroom getting ready for school and the amount of time they spend in the bathroom getting ready to go to a party.

TIME SPENT GETTING READY FOR SCHOOL:

TIME SPENT GETTING READY FOR A PARTY:

13. What percent of teenagers spend at least 15 minutes getting ready for a party?

14. What is the 3 rd  Quartile for the time spent getting ready for a party?

15. Is it more common for a teenager to spend more than 1 hour getting ready for school or between 1 and 2 hrs getting ready for a party? Explain

Answer True or False for questions 16-24.

16. ______ Some teenagers do not spend time getting ready for parties.

17. ______ The graph of time spent getting ready for a party contains more data than the getting ready for school graph.

18. ______ 25% of teenagers spend between 48 and 60 minutes getting ready for school.

19. ______ 15% of the teenagers did not go to parties that month

20. ______ In general teenagers spend more time getting ready for a party than getting ready for school.

21. ______ The Party data is more varied than the homework data

22. ______ The ratio of teenagers who spend more than 110 minutes getting ready for a party to those who spend less is about 2:1

23. ______ 225 Teenagers watch TV.

24. ______ Twice as many teenagers spend more than 1 hour on getting ready for school, than they do spending an hour getting ready for a party.

### Vocabulary Language: English

arithmetic mean

arithmetic mean

The arithmetic mean is also called the average.
back-to-back stem plots

back-to-back stem plots

A Back-to-Back stem plot is a modified stem-and-leaf plot with the stem in the center and the leaves on the sides, it is used to compare two different related sets of data (bivariate data).
bell shaped

bell shaped

A bell shaped histogram is a histogram with a prominent ‘mound’ in the center and similar tapering to the left and right.
bins

bins

Bins are groups of data plotted on the x-axis.
bivariate data

bivariate data

Bivariate data consists of two paired sets of data.
box- and- whisker plot

box- and- whisker plot

A box- and- whisker plot is a graphic display of quantitative data that demonstrates the five number summary.
calculated data

calculated data

Calculated data has values that are the result of computations performed on the input variable.
dependent variable

dependent variable

The dependent variable is the output variable in an equation or function, commonly represented by $y$ or $f(x)$.
explanatory variables

explanatory variables

Explanatory variables are another name for independent variables.
extreme outliers

extreme outliers

Extreme outliers include points more than 3 times the middle half of your data.      .
Extremes

Extremes

The extremes are the maximum and minimum values in a data set.
five point summary

five point summary

The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.
independent variable

independent variable

The independent variable is the input variable in an equation or function, commonly represented by $x$.
input variables

input variables

Input variables are another name for independent variables.
Interquartile range

Interquartile range

The interquartile range is the difference between the third quartile and the first quartile (Q3-Q1).
Leaf

Leaf

The leaves of a stem-and-leaf plot are the rightmost digits of each of the original data values.
line of best fit

line of best fit

A line of best fit is a straight line drawn on a scatter plot such that the sums of the distances to the points on either side of the line are approximately equal and such that there are an equal number of points above and below the line.
line of fit

line of fit

A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).
linear regression

linear regression

In statistics, linear regression is a process that attempts to model the relationship between two variables by fitting a linear equation to the data.
lower median

lower median

The lower median is the first quartile (Q1) in the box-and-whisker plot.
Median

Median

The median of a data set is the middle value of an organized data set.
mild outliers

mild outliers

Mild outliers include data points that are more than 1.5 times the middle half of your data above the upper, or below the lower, quartiles.
modified box-plot

modified box-plot

A modified box plot has whiskers that extend to the highest and lowest non-outlier value.
normal distributed

normal distributed

If data is normally distributed, the data set creates a symmetric histogram that looks like a bell.
observed data

observed data

Observed data are the values that result from computations performed on the input variable.
Outlier

Outlier

In statistics, an outlier is a data value that is far from other data values.
output variables

output variables

Output variables are another name for dependent variables.
Quartile

Quartile

A quartile is each of four equal groups that a data set can be divided into.
range

range

The range of a set of data is the difference in value between the least and greatest values in the set.
response variables

response variables

Response variables are another name for dependent variables.
skewed

skewed

As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.
statistical correlation

statistical correlation

Statistical correlation is a representation of possible related changes in values between the two sets of data.
stem

stem

A stem  in a stem plot is a values or column of values that represent the greatest place value(s) in a set of data.
Stem-and-leaf plot

Stem-and-leaf plot

A stem-and-leaf plot is a way of organizing data values from least to greatest using place value. Usually, the last digit of each data value becomes the "leaf" and the other digits become the "stem".
trends

trends

Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpoint
uniform

uniform

A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.
upper median

upper median

The upper median is the third quartile (Q3) in the box-and-whisker plot.