7.11: BoxandWhisker Plots
You're a meteorologist and you're collecting temperature data from various locations in your state for the month of February. You've collected over 2,000 temperatures at the same time each day and want to organize them to see if there are patterns. You want to find out the lowest temperature, the highest temperature, the median temperature, the median of the first half of the month and the median of the second half of the month. How would you organize your data to get these answers?
Watch This
First watch this video to learn about boxandwhisker plots.
CK12 Foundation: Chapter7BoxandWhiskerPlotsA
Then watch this video to see some examples.
CK12 Foundation: Chapter7BoxandWhiskerPlotsB
Watch this video for more help.
Khan Academy Boxandwhisker Plot
Guidance
In traditional statistics, data is organized by using a frequency distribution. The results of the frequency distribution can then be used to create various graphs, such as a histogram or a frequency polygon, which indicate the shape or nature of the distribution. The shape of the distribution will allow you to confirm various conjectures about the nature of the data.
To examine data in order to identify patterns, trends, or relationships, exploratory data analysis is used. In exploratory data analysis, organized data is displayed in order to make decisions or suggestions regarding further actions. A boxandwhisker plot (often called a box plot) can be used to graphically represent the data set, and the graph involves plotting 5 specific values. The 5 specific values are often referred to as a fivenumber summary of the organized data set. The fivenumber summary consists of the following:
 The lowest number in the data set (minimum value)
 The median of the lower quartile: (median of the first half of the data set)
 The median of the entire data set (median)
 The median of the upper quartile: (median of the second half of the data set)
 The highest number in the data set (maximum value)
The display of the fivenumber summary produces a boxandwhisker plot as shown below:
The above model of a boxandwhisker plot shows 2 horizontal lines (the whiskers) that each contain 25% of the data and are of the same length. In addition, it shows that the median of the data set is in the middle of the box, which contains 50% of the data. The lengths of the whiskers and the location of the median with respect to the center of the box are used to describe the distribution of the data. It's important to note that this is just an example. Not all boxandwhisker plots have the median in the middle of the box and whiskers of the same size.
Information about the data set that can be determined from the boxandwhisker plot with respect to the location of the median includes the following:
a. If the median is located in the center or near the center of the box, the distribution is approximately symmetric.
b. If the median is located to the left of the center of the box, the distribution is positively skewed.
c. If the median is located to the right of the center of the box, the distribution is negatively skewed.
Information about the data set that can be determined from the boxandwhisker plot with respect to the length of the whiskers includes the following:
a. If the whiskers are the same or almost the same length, the distribution is approximately symmetric.
b. If the right whisker is longer than the left whisker, the distribution is positively skewed.
c. If the left whisker is longer than the right whisker, the distribution is negatively skewed.
The length of the whiskers also gives you information about how spread out the data is.
A boxandwhisker plot is often used when the number of data values is large. The center of the distribution, the nature of the distribution, and the range of the data are very obvious from the graph. The fivenumber summary divides the data into quarters by use of the medians of the upper and lower halves of the data. Many data sets contain values that are either extremely high values or extremely low values compared to the rest of the data values. These values are called outliers . There are several reasons why a data set may contain an outlier. Some of these are listed below:
 The value may be the result of an error made in measurement or in observation. The researcher may have measured the variable incorrectly.
 The value may simply be an error made by the researcher in recording the value. The value may have been written or typed incorrectly.
 The value could be a result obtained from a subject not within the defined population. A researcher recording marks from a math 12 examination may have recorded a mark by a student in grade 11 who was taking math 12.
 The value could be one that is legitimate but is extreme compared to the other values in the data set. (This rarely occurs, but it is a possibility.)
If an outlier is present because of an error in measurement, observation, or recording, then either the error should be corrected, or the outlier should be omitted from the data set. If the outlier is a legitimate value, then the statistician must make a decision as to whether or not to include it in the set of data values. There is no rule that tells you what to do with an outlier in this case.
One method for checking a data set for the presence of an outlier is to follow the procedure below:
 Organize the given data set and determine the values of and .
 Calculate the difference between and . This difference is called the interquartile range (IQR) : .
 Multiply the difference by 1.5, subtract this result from , and add it to .
 The results from Step 3 will be the range into which all values of the data set should fit. Any values that are below or above this range are considered outliers.
Example A
For each boxandwhisker plot, list the fivenumber summary and describe the distribution based on the location of the median.
a. Minimum value
Median
Maximum value
The median of the data set is located to the right of the center of the box, which indicates that the distribution is negatively skewed.
b. Minimum value
Median
Maximum value
The median of the data set is located to the right of the center of the box, which indicates that the distribution is negatively skewed.
c. Minimum value
Median
Maximum value
The median of the data set is located to the left of the center of the box, which indicates that the distribution is positively skewed.
Example B
The numbers of square feet (in 100s) of 10 of the largest museums in the world are shown below:
650, 547, 204, 213, 343, 288, 222, 250, 287, 269
Construct a boxandwhisker plot for the above data set and describe the distribution.
The first step is to organize the data values as follows:
Now calculate the median, , and .
Next, complete the following list:
Minimum value
Median
Maximum value
The right whisker is longer than the left whisker, which indicates that the distribution is positively skewed.
Example C
Using the procedure outlined above, check the following data sets for outliers:
a. 18, 20, 24, 21, 5, 23, 19, 22
b. 13, 15, 19, 14, 26, 17, 12, 42, 18
a. Organize the given data set as follows:
Determine the values for and .
Calculate the difference between and :
Multiply this difference by 1.5: .
Finally, compute the range.
Are there any data values below 12.5? Yes, the value of 5 is below 12.5 and is, therefore, an outlier.
Are there any values above 28.5? No, there are no values above 28.5.
b. Organize the given data set as follows:
Determine the values for and .
Calculate the difference between and :
Multiply this difference by 1.5: .
Finally, compute the range.
Are there any data values below 0? No, there are no values below 0.
Are there any values above 36.0? Yes, the value of 42 is above 36.0 and is, therefore, an outlier.
Points to Consider
 Are there still other ways to represent data graphically?
 Are there other uses for a boxandwhisker plot?
 Can boxandwhisker plots be used for comparing data sets?
Guided Practice
a. For the following data sets, determine the fivenumber summaries:
i. 12, 16, 36, 10, 31, 23, 58
ii. 144, 240, 153, 629, 540, 300
b. Use the data set for part i of the previous question and the fivenumber summary to construct a boxandwhisker plot to model the data set.
Answer:
a. i. The first step is to organize the values in the data set as shown below:
Now complete the following list:
Minimum value
Median
Maximum value
ii. The first step is to organize the values in the data set as shown below:
Now complete the following list:
Minimum value
Median
Maximum value
b. The fivenumber summary can now be used to construct a boxandwhisker plot for part i. Be sure to provide a scale on the number line that includes the range from the minimum value to the maximum value.
Minimum value
Median
Maximum value
It is very visible that the right whisker is much longer than the left whisker. This indicates that the distribution is positively skewed.
Explore More

Which of the following is not a part of the fivenumber summary?
 and
 the mean
 the median
 minimum and maximum values

What percent of the data is contained in the box of a boxandwhisker plot?
 25%
 100%
 50%
 75%

What name is given to the horizontal lines to the left and right of the box of a boxandwhisker plot?
 axis
 whisker
 range
 plane

What term describes the distribution of a data set if the median of the data set is located to the left of the center of the box in a boxandwhisker plot?
 positively skewed
 negatively skewed
 approximately symmetric
 not skewed

What 2 values of the fivenumber summary are connected with 2 horizontal lines on a boxandwhisker plot?
 Minimum value and the median
 Maximum value and the median
 Minimum and maximum values
 and

For the following data sets, determine the fivenumber summaries:
 74, 69, 83, 79, 60, 75, 67, 71
 6, 9, 3, 12, 11, 9, 15, 5, 7

For each of the following boxandwhisker plots, list the fivenumber summary and comment on the distribution of the data:
 The following data represents the number of coins that 12 randomly selected people had in their piggy banks: Construct a boxandwhisker plot for the above data.
 The following data represent the time (in minutes) that each of 20 people waited in line at a local book store to purchase the latest Harry Potter book: Construct a boxandwhisker plot for the above data. Are the data skewed in any direction?
 Firman’s Fitness Factory is a new gym that offers reasonablypriced family packages. The following table represents the number of family packages sold during the opening month: Construct a boxandwhisker plot for the data. Are the data symmetric or skewed?
 Shown below is the number of new stage shows that appeared in Las Vegas for each of the past several years. Construct a boxandwhisker plot for the data and comment of the shape of the distribution.
 The following data represent the average snowfall (in centimeters) for 18 Canadian cities for the month of January. Construct a boxandwhisker plot to model the data. Is the data skewed? Justify your answer.
Name of City  Amount of Snow(cm) 

Calgary  123.4 
Charlottetown  74.5 
Edmonton  80.6 
Fredericton  73.8 
Halifax  64.0 
Labrador City  110.4 
Moncton  82.4 
Montreal  63.6 
Ottawa  48.9 
Quebec City  53.8 
Regina  35.9 
Saskatoon  25.4 
St. John’s  97.5 
Sydney  44.2 
Toronto  21.8 
Vancouver  12.8 
Victoria  8.3 
Winnipeg  76.2 

Using the procedure outlined in this concept, check the following data sets for outliers:
 25, 33, 55, 32, 17, 19, 15, 18, 21
 149, 123, 126, 122, 129, 120
boxandwhisker plot
A boxandwhisker plot is a graph based upon medians. It shows the minimum value, the lower median, the median, the upper median, and the maximum value of a data set. It is also known as a box plot.fivenumber summary
The numbers needed to construct a boxandwhisker plot are called the fivenumber summary. The fivenumber summary are: the minimum value, , the median, , and the maximum value.lower median
The lower median is the first quartile (Q1) in the boxandwhisker plot.upper median
The upper median is the third quartile (Q3) in the boxandwhisker plot.arithmetic mean
The arithmetic mean is also called the average.backtoback stem plots
A BacktoBack stem plot is a modified stemandleaf plot with the stem in the center and the leaves on the sides, it is used to compare two different related sets of data (bivariate data).bell shaped
A bell shaped histogram is a histogram with a prominent ‘mound’ in the center and similar tapering to the left and right.bins
Bins are groups of data plotted on the xaxis.bivariate data
Bivariate data consists of two paired sets of data.box and whisker plot
A box and whisker plot is a graphic display of quantitative data that demonstrates the five number summary.calculated data
Calculated data has values that are the result of computations performed on the input variable.dependent variable
The dependent variable is the output variable in an equation or function, commonly represented by or .explanatory variables
Explanatory variables are another name for independent variables.extreme outliers
Extreme outliers include points more than 3 times the middle half of your data. .Extremes
The extremes are the maximum and minimum values in a data set.five point summary
The numbers needed to construct a boxandwhisker plot are called the fivepointsummary. The five points are the minimum, the lower median (Q1), the median, the upper median (Q3), and the maximum.independent variable
The independent variable is the input variable in an equation or function, commonly represented by .input variables
Input variables are another name for independent variables.Interquartile range
The interquartile range is the difference between the third quartile and the first quartile (Q3Q1).Leaf
The leaves of a stemandleaf plot are the rightmost digits of each of the original data values.line of best fit
A line of best fit is a straight line drawn on a scatter plot such that the sums of the distances to the points on either side of the line are approximately equal and such that there are an equal number of points above and below the line.line of fit
A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data).linear regression
In statistics, linear regression is a process that attempts to model the relationship between two variables by fitting a linear equation to the data.Median
The median of a data set is the middle value of an organized data set.mild outliers
Mild outliers include data points that are more than 1.5 times the middle half of your data above the upper, or below the lower, quartiles.modified boxplot
A modified box plot has whiskers that extend to the highest and lowest nonoutlier value.normal distributed
If data is normally distributed, the data set creates a symmetric histogram that looks like a bell.observed data
Observed data are the values that result from computations performed on the input variable.Outlier
In statistics, an outlier is a data value that is far from other data values.output variables
Output variables are another name for dependent variables.Quartile
A quartile is each of four equal groups that a data set can be divided into.range
The range of a set of data is the difference in value between the least and greatest values in the set.response variables
Response variables are another name for dependent variables.skewed
As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.statistical correlation
Statistical correlation is a representation of possible related changes in values between the two sets of data.stem
A stem in a stem plot is a values or column of values that represent the greatest place value(s) in a set of data.Stemandleaf plot
A stemandleaf plot is a way of organizing data values from least to greatest using place value. Usually, the last digit of each data value becomes the "leaf" and the other digits become the "stem".trends
Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpointuniform
A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.Image Attributions
Description
Learning Objectives
Here you'll learn how to construct and interpret boxandwhisker plots.