<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# Box-and-Whisker Plots

## Plotting the five-number summary for ascending data.

Estimated7 minsto complete
%
Progress
Practice Box-and-Whisker Plots

MEMORY METER
This indicates how strong in your memory this concept is
Progress
Estimated7 minsto complete
%
Creating Box-and-Whisker Plots

If you were asked to create a visual representation of the mean, upper and lower 25% (quartiles), and maximum and minimum (extremes) scores on the final test in your College Algebra class, how would you go about it? Would a box-and-whisker plot be appropriate? Why or why not? What would the plot look like if the mean was 82%, the lowest score was 59%, highest was 96%, and if a quarter of the class scored above 86% while another quarter scored below 70%?

### Creating Box-and-Whisker Plots

Box-and-whisker plots (or box plots) are ideal for visually representing the five number summary of data.

First, organize the data by increasing value, then

The five number summary (or five statistical) is composed of:

• The minimum and maximum values – called the extremes
• The middle value – called the median
• The values halfway between each extreme and the median – called the quartiles.

It is important to recognize that a five number summary is more dependent on the position of each value in numerical order than on the value itself. A common confusion when gathering data for a box plot is to think that the plot is based on the arithmetic mean of the data rather than the median, don’t fall into this trap!  To create a box plot, which is based on the fine number summary, you first need to organize your data in increasing numerical order, and then identify your five numbers based on position in the ascending series.

A sometimes tricky detail is the handling and identification of outliers. Once you have identified the median and quartiles of your data, you should review the values at the lower and upper limits to see if there are any that seem unusually extreme before considering them to be part of your 5 number summary. Specifically, data points that are more than 1.5 times the inter-quartile range (the range of values between the first and third quartiles – representing the middle half of your data), may be considered mild outliers. Any points more than 3 times the inter-quartile range may be considered extreme outliers. Outliers are commonly plotted as stars or asterisks (mild outliers) or open circles (extreme outliers), and are not a part of the actual box plot or the five number summary.

Once you have identified your five number summary, create a number line extending at least 10% past the upper and lower extremes of your data, and plot each of the five numbers above the appropriate locations on the line. Now create a rectangular box with sides on the 1st and 3rd quartiles. Draw a vertical line inside the box to represent the median, and draw horizontal lines from the sides of the box to the extremes. Finally, identify any mild outliers with asterisks/stars and extreme outliers with open circles.

#### Creating a Five Number Summary

1. Create a five number summary for the data below, and identify any outliers:

1, 5, 8, 2, 1, 7, 4, 4, 5, 6, 8, 2, 6, 5, 9

A five number summary includes the median, the upper and lower extremes, and the first and third quartiles. The first step to identifying them is to organize the data by ascending numerical value:

1, 1, 2, 2, 4, 4, 5, 5, 5, 6, 6, 7, 8, 8, 9

• Finding the median: Note that there are 15 values, an odd number, so the middle number in the series is the median. The value “5” has 7 values above and 7 below. 5 is the median.
• The 1st quartile is the median of the lower half of the data. There are 7 values below the median, and the middle number of them is “2”, with three values below and three above before the median. The 1st quartile is 2.
• The 3rd quartile is the median of the upper half of the data. There are 7 values above the median, and the middle value is “7”, with three values above it and three below it before the median. 7 is the 3rd quartile.
• Are there any outliers? The inter-quartile range is the difference between the 1st and 3rd quartiles: \begin{align*}7 - 2 = 5\end{align*}. Recall that a value should be considered an outlier if it is unusually low in frequency and greater than 1.5 times the inter-quartile range from the median. In this case, than would mean any number more than 7.5 above the 3rd quartile, 7, or below the 1st quartile, 2. That would make any value less than -5.5 or greater than 14.5 be considered a mild outlier. There are no negative values and no values greater than 9, so there are no outliers.
• The minimum and maximum values are the least and greatest values, respectively. Since we have organized our data in ascending order, the minimum value is on the far left, “1”, and the maximum value is on the far right, “9”.

#### The minimum is 1 and the maximum is 9.

2. Identify the five statistical summary and any outliers in the data below:

18, 16, 18, 17, 15, 2, 17, 20, 19, 18, 15, 16, 28, 18

Begin by ordering the data numerically:

2, 15, 15, 16, 16, 17, 17, 18, 18, 18, 18, 19, 20, 28

• Median: There are 14 values, an even number, so the median is the average (arithmetic mean) of the two middle numbers, 17 and 18. 17.5 is the median.
• 1st and 3rd quartiles: The middle number in the lower 50% is 16, and the middle of the upper 50% is 18. 16 is the lower quartile and 18 is the upper quartile.
• The inter-quartile range is \begin{align*}18 - 16 = 2\end{align*}. Any value less than 13 or greater than 21 may be considered a mild outlier, and any value less than 10 or greater than 24 may be considered an extreme outlier. 2 and 28 are both extreme outliers.
• The least value is 2 and the greatest value is 28. 2 is the minimum and 28 is the maximum.

#### Creating Box Plots

Create box plots representing the data from the first two examples.

A. The data from 1 was encapsulated in the five number summary:

Median: 5     1st quartile: 2  3rd quartile: 7  Minimum: 1  Maximum: 9

Draw a number line running from 0 to 10, and plot the five number summary above it:

Draw a rectangle including the first and third quartiles, and a vertical line for the median. Since there are no outliers, draw a “whisker” from each side of the box to the extremes:

B. The data from 2 includes:

Median: 17.5

1st quartile: 16                  3rd quartile: 18

Minimum: 2                      Maximum: 28

Outliers (extreme): 2, 28

Draw a number line running from 0 – 30, and plot the five number summary:

Note that since “2” and “28” are both extreme outliers, the box-and-whiskers only extend to the greatest and least non-extreme values. This is sometimes called a modified boxplot.

#### Earlier Problem Revisited

If you were asked to create a visual representation of the mean, upper and lower 25% (quartiles), and maximum and minimum (extremes) scores on the final test in your College Algebra class, how would you go about it? Would a box-and-whisker plot be appropriate? Why or why not? What would the plot look like if the mean was 82%, the lowest score was 59%, highest was 96%, and if a quarter of the class scored above 86% while another quarter scored below 70%?

This would be an excellent application for a box plot. In fact, this is just about the best use of one. You will find, if you haven’t already, that the SAT and ACT college application exams report grades in just his manner. Colleges (and students themselves) inevitably wish to see how a particular score compares to others on the same test, and a box plot is ideal for that purpose.

If the data in the question were plotted as a box plot, it would appear like this:

### Examples

#### Example 1

The box-and-whisker plot below shows the starting salaries for graduates of a small college. What is the range of the starting salaries?

The range is the difference between the maximum and minimum values, that is:

\begin{align*}\ 72,000-\ 19,000=\53, 000\end{align*}

#### Example 2

Mr. Andrews made a box-and-whisker graph of the quiz grades in his chemistry class. What is the median quiz grade for the class?

The median is denoted by a line in the center of a boxplot, in this case, that would be 77%.

#### Example 3

Mr. Foreman grades on a curve in which the top 25% of the test scores earn A’s, the middle 50% earn C’s, and the bottom 25% earn F’s. The box and whisker plot below shows the distribution of scores on the last test. What is the range of scores for people who earned C's?

If the middle 50% of Mr. Foreman’s class earns C’s, then all of the scores in the interquartile range, the area between Q1 and Q3, would be included. Since the “box” of a boxplot indicates the IQR, that would be 65% - 80%.

### Review

Use the following boxplot to answer questions 1 – 5 below:

1. What is the median
2. What is the lower quartile
3. What is the upper quartile
4. What is the minimum value
5. What is the maximum value
6. What are the five values called?
7. What is the range of the data?
8. What percentage of the data is below the upper quartile?
9. What percentage of data is located between the lower quartile and the median?
10. What percentage of data is above the median?
11. What percentage of data is below the lower quartile?
12. Calculate the Range for the following data: 5, 21, 10, 9, 12, 12, 16, 16, 9, 6, 20, 8, 10, 26, 4, 26, and 14.
13. Calculate the First Quartile for the following data: 5, 21, 10, 9, 12, 14, 13, 16, 9, 6, 20, 8, 12, 24, 4, 26, and 14
14. State the five number summary of the following data set: 13, 14, 10, 4, 18, 17, 11, 10, 5, 7, 10 19, 13
15. Construct a box and whisker plot for the data set given in question 12

### Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes

### Vocabulary Language: English

Extremes

The extremes are the maximum and minimum values in a data set.

five point summary

The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.

line of fit

A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).

Median

The median of a data set is the middle value of an organized data set.

observed data

Observed data are the values that result from computations performed on the input variable.

Outlier

In statistics, an outlier is a data value that is far from other data values.

Quartile

A quartile is each of four equal groups that a data set can be divided into.

skewed

As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.

statistical correlation

Statistical correlation is a representation of possible related changes in values between the two sets of data.

trends

Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpoint

uniform

A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.