13.12: Box-and-Whisker Plots
What if your teacher recorded each of her student's scores on the last math test? How could she display that data in such a way that it was broken up into four distinct segments? After completing this Concept, you'll be able to make and interpret box-and-whisker plots for data such as this.
Watch This
CK-12 Foundation: Box-and-Whisker Plots
Guidance
Consider the following list of numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
The median is the \begin{align*}\left (\frac{n+1}{2} \right)\end{align*} th value. There are 10 values, so the median lies halfway between the \begin{align*}5^{th}\end{align*} and the \begin{align*}6^{th}\end{align*} value. The median is therefore 5.5. This splits the list cleanly into two halves.
The lower list is: 1, 2, 3, 4, 5
And the upper list is: 6, 7, 8, 9, 10
The median of the lower half is 3. The median of the upper half is 8. These numbers, together with the median, cut the list into four quarters. We call the division between the lower two quarters the first quartile. The division between the upper two quarters is the third quartile (the second quartile is, of course, the median).
A box-and-whisker plot is formed by placing vertical lines at five positions, corresponding to the smallest value, the first quartile, the median, the third quartile and the greatest value. (These five numbers are often referred to as the five number summary.) A box is drawn between the position of the first and third quartiles, and horizontal line segments (the whiskers) connect the box with the two extreme values.
The box-and-whisker plot for the integers 1 through 10 is shown below.
With a box-and-whisker plot, a simple measure of dispersion can be gained from the distance from the first quartile to the third quartile. This inter-quartile range is a measure of the spread of the middle half of the data.
Example A
Forty students took a college algebra entrance test and the results are summarized in the box-and-whisker plot below. How many students would be allowed to enroll in the class if the pass mark was set at
a) 65%
b) 60%
Solution
From the plot, we can see the following information:
Lowest score = 50%
First quartile = 60%
Median score = 65%
Third quartile = 77%
Highest score = 97%
Since the pass marks given in the question correspond with the median and the first quartile, the question is really asking how many students there are in: a) the upper half and b) the upper 3 quartiles.
a) Since there are 40 students, there are 20 in the upper half; that is, 20 students scored above 65%.
b) Similarly, there are 30 students in the upper 3 quartiles, so 30 students scored above 60%.
Example B
Harika is rolling 3 dice and adding the numbers together. She records the total score for each of 50 rolls, and the scores she gets are shown below. Display the data in a box-and-whisker plot, and find both the range and the inter-quartile range.
9, 10, 12, 13, 10, 14, 8, 10, 12, 6, 8, 11, 12, 12, 9, 11, 10, 15, 10, 8, 8, 12, 10, 14, 10, 9, 7, 5, 11, 15, 8, 9, 17, 12, 12, 13, 7, 14, 6, 17, 11, 15, 10, 13, 9, 7, 12, 13, 10, 12
Solution
First we’ll put the list in order. Since there are 50 data points, \begin{align*}\left ( \frac{n+1}{2} \right ) =26.5\end{align*}, so the median will be the mean of the \begin{align*}25^{th}\end{align*} and \begin{align*}26^{th}\end{align*} values. The median will split the data into two lists of 25 values; we can write them as two distinct lists.
\begin{align*}& 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 9, \colorbox{yellow}{9}, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, \colorbox{yellow}{10}, \colorbox{yellow}{11}, 11, \\ & 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, \colorbox{yellow}{13}, 13, 13, 13, 14, 14, 14, 15, 15, 15, 17, 17\end{align*}
Since each sub-list has 25 values, the first and third quartiles of the entire data set can be found from the median of each smaller list. For 25 values, \begin{align*}\left ( \frac{n+1}{2} \right ) =13\end{align*}, and so the quartiles are given by the \begin{align*}13^{th}\end{align*} value from each smaller sub-list.
From the ordered list we can see the five number summary:
- The lowest value is 5
- The first quartile is 9
- The median is 10.5
- The third quartile is 13
- The highest value is 17.
The box-and-whisker plot therefore looks like this:
The range is given by subtracting the smallest value from the largest value: \begin{align*}17 – 5 = 12\end{align*}.
The inter-quartile range is given by subtracting the first quartile from the third quartile: \begin{align*}13 - 9 = 4\end{align*}.
Representing Outliers in a Box-and-Whisker Plot
Box-and-whisker plots can be misleading if we don’t take outliers into account. An outlier is a data point that does not fit well with the other data in the list. For box-and-whisker plots, we can define which points are outliers by how far they are from the box part of the diagram. Defining which data are outliers is somewhat arbitrary, but many books use the norm that follows. Our basic measure of distance will be the inter-quartile range (IQR).
- A mild outlier is a point that falls more than 1.5 times the IQR outside of the box.
- An extreme outlier is a point that falls more than 3 times the IQR outside of the box.
When we draw a box-and-whisker plot, we don’t include the outliers in the “whisker” part of the plot; instead, we draw them as separate points.
Example C
Draw a box-and-whisker plot for the following ordered list of data:
\begin{align*}1, 2, 5, \colorbox{yellow}{9}, 10, 10, \colorbox{yellow}{11, 12}, 13, 13, \colorbox{yellow}{14}, 19, 25, 30\end{align*}
Solution
From the ordered list we see:
- The lowest value is 1.
- The first quartile \begin{align*}(Q_1)\end{align*} is 9.
- The median is 11.5.
- The third quartile \begin{align*}(Q_3)\end{align*} is 14.
- The highest value is 30.
Before we start to draw our box-and-whisker plot, we can determine the IQR:
\begin{align*}IQR = Q_3 - Q_1 = 14 - 9 = 5\end{align*}
Outliers are points that fall more than 1.5 times the IQR outside of the box—in other words, values that are more than 7.5 units less than 9 or greater than 14. So any values less than 1.5 or greater than 21.5 are outliers.
Looking back at the data we see:
- The value of 1 is less than 1.5, so it is a mild outlier.
- The value 2 is the lowest value that falls within the included range.
- The value 30 is greater than 21.5. In fact, it’s not just more than 7.5 units outside the box, it’s more than twice that far outside the box. Since it falls more than 3 times the IQR above the third quartile, it’s an extreme outlier.
- The value 25 is also greater than 21.5, so it is a mild outlier.
- The value 19 is the highest value that falls within the included range.
So when we draw our box-and-whisker plot, the whiskers will only go out as far as 2 and 19 respectively. The points outside of that range are all outliers. Here is the plot:
Making Box-and-Whisker Plots Using a Graphing Calculator
Graphing calculators make analyzing large lists of data easy. They have built-in algorithms for finding the median and the quartiles, and can be used to display box-and-whisker plots.
Example D
The ages of all the passengers traveling in a train carriage are shown below.
35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2, 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21
Use a graphing calculator to:
a) obtain the 5 number summary for the data.
b) create a box-and-whisker plot.
c) determine if any of the points are outliers.
Solution
Enter the data in your calculator:
Press [START] then choose [EDIT].
Enter all 43 data points in list \begin{align*}L_1\end{align*}.
Find the 5 number summary:
Press [START] again. Use the right arrow to choose [CALU].
Highlight the 1-Var Stats option. Press [EDIT].
The single variable statistics summary appears.
Note the mean (\begin{align*} \bar x\end{align*}) is the first item given.
Use the down arrow to bring up the data for the five number summary. \begin{align*}n\end{align*} is the number of data points, and the final fie numbers in the screen are the numbers we require.
Symbol | Value | |
---|---|---|
Lowest Value | minX | 1 |
First Quartile | \begin{align*}Q_2\end{align*} | 21 |
Median | Med | 37 |
Third Quartile | \begin{align*}Q_3\end{align*} | 45 |
Highest Value | maxX | 84 |
Display the box-and-whisker plot:
Bring up the [STARTPLOT] option by pressing [2nd]. [Y=].
Highlight 1:Plot1 and press [ENTER].
There are two types of box-and-whisker plots available. The first automatically identifies outliers. Highlight it and press [ENTER].
Press [WINDOW] and ensure that Xmin and Xmax allow for all data points to be shown. In this example, \begin{align*}Xmin = 0\end{align*} and \begin{align*}Xmax = 100\end{align*}.
Press [GRAPH] and the box-and-whisker plot should appear.
The calculator will automatically identify outliers and plot them as such. You can use the [TRACE] function along with the arrows to identify outlier values. In this case there is one outlier: 84.
Watch this video for help with the Examples above.
CK-12 Foundation: Box and Whisker Plots
Vocabulary
- We call the division between the lower two quarters the first quartile. The division between the upper two quarters is the third quartile (the second quartile is, of course, the median).
- A box-and-whisker plot is formed by placing vertical lines at five positions, corresponding to the smallest value, the first quartile, the median, the third quartile and the greatest value. (These five numbers are often referred to as the five number summary.) A box is drawn between the position of the first and third quartiles, and horizontal line segments (the whiskers) connect the box with the two extreme values.
Guided Practice
The box-and-whisker plots below represent the times taken by a school class to complete an obstacle course. The times have been separated into boys and girls. The boys and the girls each think that they did best. Determine the five number summary for both the boys and the girls and give a convincing argument for each of them.
Solution
Comparing two sets of data with a box-and-whisker plot is relatively straightforward. For example, you can see that the data for the boys is more spread out, both in terms of the range and the inter-quartile range.
The five number summary for each is shown in the table below.
Boys | Girls | |
---|---|---|
Lowest value | 1:30 | 1:40 |
First Quartile | 2:00 | 2:30 |
Median | 2:30 | 2:55 |
Third Quartile | 3:30 | 3:20 |
Highest value | 5:10 | 4:10 |
Here are some points each side could use in their argument:
Boys:
- The boys had the fastest time (1 minute 30 seconds), so the fastest individual was a boy.
- The boys also had the smaller median (2 minutes 30 seconds), meaning half of the boys were finished when only one fourth of the girls were finished (since the girls’ first quartile is also 2:30). In other words, the boys’ average time was faster.
Girls:
- The boys had the slowest time (5 minutes 10 seconds), so by the time all the girls were finished there was still at least one boy completing the course.
- The girls had the smaller third quartile (3 min 20 seconds), meaning that even without taking the slowest fourth of each group into account, the girls were still quickest.
Practice
- Draw a box-and-whisker plot for the following unordered data: 49, 57, 53, 54, 57, 49, 67, 51, 57, 56, 59, 57, 50, 49, 52, 53, 50, 58
- A simulation of a large number of runs of rolling 3 dice and adding the numbers results in the following 5-number summary: 3, 8, 10.5, 13, 18. Make a box-and-whisker plot for the data and comment on the differences between it and the plot in example 2.
- The box-and-whisker plots below represent the percentage of people living below the poverty line by county in both Texas and California. Determine the 5-number summary for each state, and comment on the spread of each distribution.
- The 5-number summary for the average daily temperature in Atlantic City, \begin{align*}NJ^1\end{align*} (given in \begin{align*}^\circ F\end{align*}) is: 31, 39, 52, 68, 76. Draw the box-and-whisker plot for this data and use it to determine which of the following, if any, would be considered outliers if they were included in the data:
- January’s record high temperature of \begin{align*}78^{\circ}\end{align*}
- January’s record low temperature of \begin{align*} -8^{\circ} \end{align*}
- April’s record high temperature of \begin{align*} 94^{\circ} \end{align*}
- The all time record high of \begin{align*}106^{\circ}\end{align*}
- In 1887 Albert Michelson and Edward Morley conducted an experiment to determine the speed of light. The data for the first 10 runs (5 results in each run) is given below. Each value represents how many kilometers per second over 299,000 km/s was measured. Create a box-and-whisker plot of the data. Be sure to identify outliers and plot them as such. 850, 740, 900, 1070, 930, 850, 950, 980, 980, 880, 960, 940, 960, 940, 880, 800, 850, 880, 900, 840, 880, 880, 800, 860, 720, 720, 620, 860, 970, 950, 890, 810, 810, 820, 800, 770, 760, 740, 750, 760, 890, 840, 780, 810, 760, 810, 790, 810, 820, 850
- Is it possible to have outliers on both ends of a data set? Explain.
- Is it possible for more than half the values in a data set to be outliers? Explain.
- Is it possible for more than a quarter of the values in a data set to be outliers? Explain.
- Is it possible for either of the whiskers in a box-and-whisker plot to be of zero length? Explain.
- Is it possible for either of the whiskers in a box-and-whisker plot to be longer than the box? Explain.
- Is it possible for either of the whiskers in a box-and-whisker plot to be twice as long as the box? Explain.
\begin{align*}^1\end{align*}Information taken from data published by Rutgers University Climate Lab (http://climate.rutgers.edu)
Extremes
The extremes are the maximum and minimum values in a data set.five point summary
The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.line of fit
A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).Median
The median of a data set is the middle value of an organized data set.observed data
Observed data are the values that result from computations performed on the input variable.Outlier
In statistics, an outlier is a data value that is far from other data values.Quartile
A quartile is each of four equal groups that a data set can be divided into.skewed
As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.statistical correlation
Statistical correlation is a representation of possible related changes in values between the two sets of data.trends
Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpointuniform
A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.Image Attributions
Here you'll learn another way to graphically display a data set, called a box-and-whisker plot. You'll also learn how to interpret such displays and how to determine the effect of outliers on a data set.
Concept Nodes:
Extremes
The extremes are the maximum and minimum values in a data set.five point summary
The numbers needed to construct a box-and-whisker plot are called the five-point-summary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.line of fit
A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).Median
The median of a data set is the middle value of an organized data set.observed data
Observed data are the values that result from computations performed on the input variable.Outlier
In statistics, an outlier is a data value that is far from other data values.Quartile
A quartile is each of four equal groups that a data set can be divided into.skewed
As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.statistical correlation
Statistical correlation is a representation of possible related changes in values between the two sets of data.trends
Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpointuniform
A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.