In this Concept, you will learn about the effects of outliers and changing units on boxandwhisker plots.
Watch This
For a description of how to draw a boxandwhisker plot from given data (14.0) , see patrickJMT, Box and Whisker Plot (5:53).
Guidance
Here is some data for reservoirs in California (the names of the lakes and reservoirs have been omitted):
80, 83, 77, 95, 85, 74, 34, 68, 90, 82, 75
At first glance, the 34 should stand out. It appears as if this point is significantly different from the rest of the data. What effect does this one point have on a boxandwhisker plot?
Example A
Use a graphing calculator to investigate the boxandwhisker plot for the California reservoir data.
Solution:
Enter your data into a list as we have done before, and then choose a plot. Under 'Type', you will notice what looks like two different box and whisker plots. For now choose the second one (even though it appears on the second line, you must press the right arrow to select these plots).
Setting a window is not as important for a box plot, so we will use the calculator's ability to automatically scale a window to our data by pressing [ZOOM] and selecting '9:Zoom Stat'.
Outliers in BoxandWhisker Plots
While box plots give us a nice summary of the important features of a distribution, we lose the ability to identify individual points. The left whisker is elongated, but if we did not have the data, we would not know if all the points in that section of the data were spread out, or if it were just the result of the one outlier. It is more typical to use a modified box plot. This box plot will show an outlier as a single, disconnected point and will stop the whisker at the previous point.
Example B
Make a modified box plot for the California reservoir data.
Solution:
Go back and change your plot to the first box plot option, which is the modified box plot, and then graph it.
Notice that without the outlier, the distribution is really roughly symmetric.
Calculating Outliers
The California reservoir data set had one obvious outlier, but when is a point far enough away to be called an outlier? We need a standard accepted practice for defining an outlier in a box plot. This rather arbitrary definition is that any point that is more than 1.5 times the interquartile range will be considered an outlier. Because the is the same as the length of the box, any point that is more than oneandahalf box lengths from either quartile is plotted as an outlier.
A common misconception of students is that you stop the whisker at this boundary line. In fact, the last point on the whisker that is not an outlier is where the whisker stops.
Example C
Determine whether there are any outliers for the California reservoir data.
Solution:
The calculations for determining the outlier in this case are as follows:
Lower Quartile: 74
Upper Quartile: 85
Interquartile range
Cutoff for outliers in left whisker: . Thus, any value less than 57.5 is considered an outlier.
Notice that we did not even bother to test the calculation on the right whisker, because it should be obvious from a quick visual inspection that there are no points that are farther than even one box length away from the upper quartile.
If you press [TRACE] and use the left or right arrows, the calculator will trace the values of the fivenumber summary, as well as the outlier.
There is only one outlier, and that is the data point 34.
The Effects of Changing Units on Shape, Center, and Spread
In a previous Concept, we looked at data for the materials in a typical desktop computer.
Material  Kilograms 

Plastics  6.21 
Lead  1.71 
Aluminum  3.83 
Iron  5.54 
Copper  2.12 
Tin  0.27 
Zinc  0.60 
Nickel  0.23 
Barium  0.05 
Other elements and chemicals  6.44 
Here is the data set given in pounds. The weight of each in kilograms was multiplied by 2.2.
Material  Pounds 

Plastics  13.7 
Lead  3.8 
Aluminum  8.4 
Iron  12.2 
Copper  4.7 
Tin  0.6 
Zinc  1.3 
Nickel  0.5 
Barium  0.1 
Other elements and chemicals  14.2 
What effect does this conversion from kilograms to pounds have on some of the statistics we use to summarize data?
Example D
Determine the effect of the conversion from kilograms to pounds on the mean, standard deviation and box plots.
Solutions:
When all values are multiplied by a factor of 2.2, the calculation of the mean is also multiplied by 2.2, so the center of the distribution would be increased by the same factor. Similarly, calculations of the range, interquartile range, and standard deviation will also be increased by the same factor. In other words, the center and the measures of spread will increase proportionally.
Note: This is easier to convince yourself when you are working with actual numbers. Suppose that your mean is 20, and that two of the data values in your distribution are 21 and 23. If you multiply 21 and 23 by 2, you get 42 and 46, and your mean also changes by a factor of 2 and is now 40. Before your deviations were and , but now, your deviations are and , so your deviations are getting twice as big as well.
Since each number in the data set is doubled, the fivenumber summary is doubled, which makes the values in the box plot doubled. This results in the graph maintaining the same shape, but being stretched out, or elongated. Here are the sidebyside box plots for both distributions showing the effects of changing units.
On the Web
http://en.wikipedia.org/wiki/Box_plot
http://tinyurl.com/3ao9px More investigation of boxplots.
Vocabulary
While an outlier is simply a point that is not typical of the rest of the data, there is an accepted definition of an outlier in the context of a boxandwhisker plot . Any point that is more than 1.5 times the length of the box from either end of the box is considered to be an outlier. When changing the units of a distribution, the center and spread will be affected, but the shape will stay the same.
Guided Practice
Given the following data set:
111, 122, 133, 149, 126, 117, 101, 121
a. Find the median value for the data set.
b. Find the values of the upper and lower quartiles.
c. Find the value of the interquartile range (IQR).
d. Identify any outliers in the dataset.
e. Draw a box and whisker plot for this data.
Solutions:
a. To find the median, put the data in order and find the middle data point. That is, find the data point that has 50% of the data below it and 50% of the data above it. The data in order: 101, 111, 117, 121, 122, 126, 133, 149. There are 8 data points. The median would be between the 4th and 5th data points. In this case, the median is 121.5. Note that the median does not have to be a data point.
b. The lower quartile is the lower fourth of the data and the upper quartile separates the upper fourth of the data from the lower 75% of the data. In this data set the lower quartile is 114 and the upper quartile is 128.5
c. The interquartile range (IQR) is 128.5 – 114 = 14.5
d. Use the 1.5IQR rule: 1.5*IQR = 21.75. 128.5 + 21.75 = 150.25. Any value greater than 150.25 would be an outlier. There are no such values in this data set. 114 – 21.75 = 92.25. Any value less than 92.25 would be considered an outlier. There are no such values in this dataset.
e.
Practice
For 17, use the table below, which contains recent data on the average price of a gallon of gasoline for states that share a border crossing into Canada.
 Find the fivenumber summary for this data.
 Show all work to test for outliers.
 Graph the boxandwhisker plot for this data.
 Canadian gasoline is sold in liters. Suppose a Canadian crossed the border into one of these states and wanted to compare the cost of gasoline. There are 3.7854 liters in a gallon. If we were to convert the distribution to liters, describe the resulting shape, center, and spread of the new distribution.
 Complete the following table. Convert to cost per liter by dividing by 3.7854, and then graph the resulting box plot.
 Look up the current data and compare that distribution with the data presented here.
 Find the exchange rate for Canadian dollars and convert the prices into American dollars.
State  Average Price of a Gallon of Gasoline (US$)  Average Price of a Liter of Gasoline (US$) 

Alaska  3.458  
Washington  3.528  
Idaho  3.26  
Montana  3.22  
North Dakota  3.282  
Minnesota  3.12  
Michigan  3.352  
New York  3.393  
Vermont  3.252  
New Hampshire  3.152  
Maine  3.309 
Average Prices of a Gallon of Gasoline on March 16, 2008
Figure: Average prices of a gallon of gasoline on March 16, 2008. Source: AAA, http://fuelgaugereport.opisnet.com/sbsavg.html
 What characteristics of a data set make it easier or harder to represent it using dot plots, stemandleaf plots, histograms, and boxandwhisker plots?
 Which plots are most useful to interpret the ideas of shape, center, and spread?
 What effects do other transformations of the data have on the shape, center, and spread?

If the median of a distribution is less than the mean, which of the following statements is the most correct?
 The distribution is skewed left.
 The distribution is skewed right.
 There are outliers on the left side.
 There are outliers on the right side.
 (b) or (d) could be true.

Given the following data set: 111, 122, 133, 149, 126, 117, 101, 121
 Find the median value for the data set.
 Find the values of the upper and lower quartiles.
 Find the value of the interquartile range (IQR).
 Identify any outliers in the dataset.
 Draw a box and whisker plot for this data.