11.8: Box-and-Whisker Plots
Learning Objectives
- Make and interpret box-and-whisker plots.
- Analyze effects of outliers.
- Make box-and-whisker plots using a graphing calculator.
Making and Interpreting Box-and-Whisker Plots
Consider the following list of numbers
\begin{align*}1, 2, 3, 4, 5, 6, 7, 8, 9, 10\end{align*}
The median is the \begin{align*} \left (\frac{n + 1}{2}\right)\end{align*}
The lower list consists of the numbers.
\begin{align*}1, 2, 3, 4, 5\end{align*}
And the upper list contains the numbers.
\begin{align*}6, 7, 8, 9, 10\end{align*}
The median of the lower half is 3. The median of the upper half is 8. These numbers, together with the median cut the list into four quarters. We call the division between the lower two quarters the first quartile. The division between the upper two quarters is the third quartile (the second quartile is, of course, the median).
A box-and-whisker plot is formed by placing vertical lines at five positions, corresponding to the smallest value, the first quartile, the median, the third quartile and the greatest value. These five numbers are often referred to as the five number summary. A box is drawn between the position of the first and third quartiles, and horizontal line segments (the whiskers) connect the box with the two extreme values.
The box-and-whisker plot for the integers 1 through 10 is shown below.
With a box-and-whisker plot, a simple measure of dispersion can be gained from the distance from the first quartile to the third quartile. This inter-quartile range is a measure of the spread of the middle half of the data.
Example 1
Forty students took a college algebra entrance test and the results are summarized in the box-and-whisker plot below. How many students would be allowed to enroll in the class if the pass mark was set at
(i) 65%
(ii) 60%
From the plot, we can see the following information.
\begin{align*}\text{Lowest score}&=52 \% \\ \text{First quartile}&=60 \% \\ \text{Median score}&=65 \% \\ \text{Third quartile}&=77 \% \\ \text{Highest score}&=97\end{align*}
Since the pass marks correspond with the median and the first quartile, the question is really asking how many students are there in: (i) the upper half and (ii) the upper 3 quartiles?
Solution
(i) If the pass mark was 65%, then 20 students would pass.
(ii) If the pass mark was 60%, then 30 students would pass.
Look again at the information we gained from the box-and-whisker plot. A box-and-whisker plot will always represent five quantities in the five number summary: the lowest value, the first quartile, the median, the third quartile and the greatest value.
Example 2
Harika is rolling 3 dice and adding the scores together. She records the total score for 50 rolls, and the scores she gets are shown below. Display the data in a box-and-whisker plot, and find both the range and the inter-quartile range .
\begin{align*}& 9, 10, 12, 13, 10, 14, 8, 10, 12, 6, 8, 11, 12, 12, 9, 11, 10, 15, 10, 8, 8, 12, 10, 14, 10,\\ & 9, 7, 5, 11, 15, 8, 9, 17, 12, 12, 13, 7, 14, 6, 17, 11, 15, 10, 13, 9, 7, 12, 13, 10, 12\end{align*}
Solution
We will first covert the raw data into an ordered list. Since there are 50 data points \begin{align*} \left (\frac{n + 1}{2}\right)=25.5\end{align*}, so the median will be the mean of the \begin{align*}25^{th}\end{align*} and \begin{align*}26^{th}\end{align*} values. The median will split the data into two lists of 25 values. It makes sense therefore, to write the first 25 values and the second 25 values as two distinct lists.
\begin{align*}& 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11,\\ & 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 3, 13, 13, 13, 14, 14, 14, 15, 15, 15, 17, 17\end{align*}
Since each sub-list has 25 values, the first and third quartiles of the entire data set can be found from the median of each smaller list. For 25 values \begin{align*} \left (\frac{n + 1}{2}\right)=13\end{align*}, and so the quartiles are given by the \begin{align*}13^{th}\end{align*} value from each smaller sub-list.
From the ordered list, we see the five number summary
- The lowest value is 5.
- The first quartile is 9.
- The median is 10.5.
- The third quartile is 13.
- The highest value is 17.
The box-and-whisker plot therefore looks like this.
The range is given by subtracting the smallest value from the greatest value
\begin{align*}\text{Range}=17 - 5=12\end{align*}
The inter-quartile range is given by subtracting the first quartile from the third quartile.
\begin{align*}\text{Inter-quartile range}=\underline{13 - 9=4}\end{align*}
Example 3
The box-and-whisker plots below represent the times taken by a school class to complete a 150 yard obstacle course. The times have been separated into boys and girls. The boys and the girls both think that they did best. Determine the five number summary for both the boys and the girls and give a convincing argument for each of them.
Solution
Comparing two sets of data with a box-and-whisker plot is relatively straight forward. For example, you can see that the data for the boys is more spread out, both in terms of the range and the inter-quartile range.
The five number summary for each is shown in the table below.
Boys | Girls | |
---|---|---|
Lowest value | 1:30 | 1:40 |
First Quartile | 2:00 | 2:30 |
Median | 2:30 | 2:55 |
Third Quartile | 3:30 | 3:20 |
Highest value | 5:10 | 4:10 |
While any game needs to have set rules to avoid confusion of who wins, each side could use the following in their argument.
Boys
- The boys had the fastest time (1 minute 30 seconds), so the fastest individual was a boy.
- The boys also had the smaller median (2 min 30 seconds) meaning half of the boys were finished when only one fourth of the girls were finished (we know only one-fourth of the girls had finished since their first quartile was also 2:30).
Girls
- The boys had the slowest time (5 minutes 10 seconds), so by the time all the girls were finished there was still at least one boy (and possibly more) completing the course.
- The girls had the smaller third quartile (3 min 20 seconds) meaning that even without taking the slowest fourth of each group into account, the girls were still quickest.
Representing Outliers in a Box-and-Whisker Plot
An outlier is a data point that does not fit well with the other data in the list. For box-and-whisker plots, we can define which points are outliers by how far they are from the box part of the diagram. Which data are outliers is somewhat arbitrary, but many books use the norm that follows. Our basic measure of distance will be the inter-quartile range (IQR).
- A mild outlier is a point that falls more than 1.5 times the IQR outside of the box.
- An extreme outlier is a point that falls more than 3 times the IQR outside of the box.
Example 3
Draw a box-and-whisker plot for the following ordered list of data.
\begin{align*}1, 2, 5,9, 10, 10,11, 12, 13, 13,14, 19, 25, 30\end{align*}
Solution
From the ordered list we see
- The lowest value is 1.
- The first quartile (Q1) is 9.
- The median is 11.5.
- The third quartile (Q3) is 14.
- The highest value is 30.
Before we proceed to draw our box-and-whisker plot, we can determine the IQR:
\begin{align*}IQR=Q_3 - Q_1=14 - 9=5\end{align*}
Outliers are points that fall more than 1.5 times the IQR outside of the box. We can determine this range algebraically.
\begin{align*}\text{Lower limit for included points}&=Q_1 - (1.5 \times IQR)=9 - 7.5=1.5\\ \text{Upper limit for included points}&=Q_3 + (1.5 \times IQR)=14 + 7.5=21.5\end{align*}
Looking back at the data we see.
- The value of 1 falls more than 1.5 times the IQR below the first quartile. It is a mild outlier.
- The value 2 is the lowest value that falls within the included range.
- The value 30 falls more than 3 times the IQR above the third quartile. It is an extreme outlier.
- The value 25 falls more than 1.5 times the IQR above the third quartile. It is a mild outlier.
- The value 19 is the highest value that falls within the included range.
The box-and-whisker plot is shown below. Outliers are represented, but not included in the whiskers.
Making Box-and-Whisker Plots Using a Graphing Calculator
Graphing calculators make analyzing large lists of data easy. They have built-in algorithms for finding the median, and the quartiles and can be used to display box-and-whisker plots.
Example 5
The ages of all the passengers traveling in a train carriage are shown below.
\begin{align*}& 35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2,\\ & 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21\end{align*}
Use a graphing calculator to
(i) Obtain the 5 number summary for the data.
(ii) Create a box-and-whisker plot.
(iii) Determine if any of the points are outliers.
Solution
Step 1 Enter the data in your calculator.
Press [START] then choose [EDIT].
Enter all 43 data points in list \begin{align*}L_1\end{align*}.
Step 2: Finding the 5 number summary
Press [START] again. Use the right arrow to choose [CALU].
Highlight the 1-Var Stats option. Press [EDIT].
The single variable statistics summary appears.
Note the mean \begin{align*} (\bar{x})\end{align*} is the first item given.
Use the down arrow to bring up the data for the five number summary.
\begin{align*}n\end{align*} is the number of data points, the final five numbers in the screen are the numbers we require.
Symbol | Value | |
---|---|---|
Lowest value | minX | 1 |
First Quartile | \begin{align*}Q_2\end{align*} | 21 |
Median | Med | 37 |
Third Quartile | \begin{align*}Q_3\end{align*} | 45 |
Highest value | maxX | 84 |
Step 3 Displaying the box-and-whisker plot.
Bring up the [STARTPLOT] option by pressing [2nd]. [Y=].
Highlight 1:Plot1 and press [ENTER].
There are two types of box-and-whisker plots available. The first automatically identifies outliers. Highlight it and press [ENTER].
Press [WINDOW] and ensure that Xmin and Xmax allow for all data points to be shown. In this example, \begin{align*}\text{Xmin}=0\end{align*} and \begin{align*}\text{Xmax}=100\end{align*}.
Press [GRAPH] and the box-and-whisker plot should appear.
The calculator will automatically identify outliers and plot them as such. You can use the [TRACE] function along with the arrows to identify outlier values. In this case there is one outlier (84).
Review Questions
- Draw a box-and-whisker plot for the following unordered data. \begin{align*}49, 57, 53, 54, 49, 67, 51, 57, 56, 59, 57, 50, 49, 52, 53, 50, 58\end{align*}
- A simulation of a large number of runs of rolling three dice and adding the numbers results in the following 5-number summary: 3, 8, 10.5, 13, 18. Make a box-and-whisker plot for the data and comment on the differences between it and the plot in Example 2.
- The box-and-whisker plots below represent the percentage of people living below the poverty line by county in both Texas and California. Determine the 5 number summary for each state, and comment on the spread of each distribution.
- The 5 number summary for the average daily temperature in Atlantic City, NJ (given in \begin{align*}^\circ F\end{align*}) is 31, 39, 52, 68, 76. Draw the box-and-whisker plot for this data and use it to determine which of the following would be considered an outlier if it were included in the data.
- January’s record high temperature of \begin{align*}78^\circ\end{align*}
- January’s record low temperature of \begin{align*}-8^\circ\end{align*}
- April’s record high temperature of \begin{align*}94^\circ\end{align*}
- The all time record high of \begin{align*}106^\circ\end{align*}
- In 1887 Albert Michelson and Edward Morley conducted an experiment to determine the speed of light. The data for the first 10 runs (5 results in each run) is given below. Each value represents how many kilometers per second over 299, 000 km/s was measured. Create a box-and-whisker plot of the data. Be sure to identify outliers and plot them as such. \begin{align*}& 850, 740, 900, 1070, 930, 850, 950, 980, 980, 880, 960, 940, 960, 940, 880, 800, 850,\\ & 880, 900, 840, 880, 880, 800, 860, 720, 720, 620, 860, 970, 950, 890, 810, 810, 820,\\ & 800, 770, 760, 740, 750, 760, 890, 840, 780, 810, 760, 810, 790, 810, 820, 850\end{align*}
Review Answers
- (Upper value is just inside included range - there are no outliers)
- The box-and-whisker plot for many runs is shown. It includes values that are less likely to occur (3 and 18) so the range is greater than for a small number of runs. The median is the same and the IQR is similar, indicating that a small trial makes a good estimate of these quantities.
- California 6, 9.5, 12, 15.5, 22; Texas 5, 13, 16, 19.5, 35 Answers will vary but students should see that although the county that has the lowest poverty rate is in Texas, in general counties in Texas have a greater percentage of people living below the poverty line. \begin{align*}Q_1\end{align*}, the median and \begin{align*}Q_3\end{align*} all higher for Texas than California. The county with the highest poverty rate is in Texas, and it is worth noting that this value could be considered an outlier as it falls more than 1.5 times the IQR above \begin{align*}Q_3\end{align*}.
- The box-and-whisker plot is shown. The IQR indicates that the only outlier would be point \begin{align*}b\end{align*}.
- See the box-and-whisker plot below. The modern accepted value (299, 792 km/s) falls just below \begin{align*}Q_2\end{align*}.
Texas Instruments Resources
In the CK-12 Texas Instruments Algebra I FlexBook, there are graphing calculator activities designed to supplement the objectives for some of the lessons in this chapter. See http://www.ck12.org/flexr/chapter/9621.