Objective
Here you will learn how box-and-whisker plots are created, and some common uses of them.
Concept
If you were asked to create a visual representation of the mean, upper and lower 25% (quartiles), and maximum and minimum (extremes) scores on the final test in your College Algebra class, how would you go about it? Would a box-and-whisker plot be appropriate? Why or why not? What would the plot look like if the mean was 82%, the lowest score was 59%, highest was 96%, and if a quarter of the class scored above 86% while another quarter scored below 70%?
Watch This
http://youtu.be/GMb6HaLXmjY PatrickJMT – Box and Whisker Plot
Guidance
Box-and-whisker plots (or box plots ) are ideal for visually representing the five number summary of data.
First, organize the data by increasing value, then
The five number summary (or five statistical) is composed of:
- The minimum and maximum values – called the extremes
- The middle value – called the median
- The values halfway between each extreme and the median – called the quartiles .
It is important to recognize that a five number summary is more dependent on the position of each value in numerical order than on the value itself. A common confusion when gathering data for a box plot is to think that the plot is based on the arithmetic mean of the data rather than the median, don’t fall into this trap! To create a box plot, which is based on the fine number summary, you first need to organize your data in increasing numerical order, and then identify your five numbers based on position in the ascending series.
A sometimes tricky detail is the handling and identification of outliers. Once you have identified the median and quartiles of your data, you should review the values at the lower and upper limits to see if there are any that seem unusually extreme before considering them to be part of your 5 number summary. Specifically, data points that are more than 1.5 times the inter-quartile range (the range of values between the first and third quartiles – representing the middle half of your data), may be considered mild outliers . Any points more than 3 times the inter-quartile range may be considered extreme outliers . Outliers are commonly plotted as stars or asterisks (mild outliers) or open circles (extreme outliers), and are not a part of the actual box plot or the five number summary.
Once you have identified your five number summary, create a number line extending at least 10% past the upper and lower extremes of your data, and plot each of the five numbers above the appropriate locations on the line. Now create a rectangular box with sides on the 1 ^{ st } and 3 ^{ rd } quartiles. Draw a vertical line inside the box to represent the median, and draw horizontal lines from the sides of the box to the extremes. Finally, identify any mild outliers with asterisks/stars and extreme outliers with open circles.
Example A
Create a five number summary for the data below, and identify any outliers:
1, 5, 8, 2, 1, 7, 4, 4, 5, 6, 8, 2, 6, 5, 9
Solution: A five number summary includes the median, the upper and lower extremes, and the first and third quartiles. The first step to identifying them is to organize the data by ascending numerical value:
1, 1, 2, 2, 4, 4, 5, 5, 5, 6, 6, 7, 8, 8, 9
- Finding the median: Note that there are 15 values, an odd number, so the middle number in the series is the median. The value “5” has 7 values above and 7 below. 5 is the median .
- The 1 ^{ st } quartile is the median of the lower half of the data. There are 7 values below the median, and the middle number of them is “2”, with three values below and three above before the median. The 1 ^{ st } quartile is 2 .
- The 3 ^{ rd } quartile is the median of the upper half of the data. There are 7 values above the median, and the middle value is “7”, with three values above it and three below it before the median. 7 is the 3 ^{ rd } quartile .
- Are there any outliers? The inter-quartile range is the difference between the 1 ^{ st } and 3 ^{ rd } quartiles: . Recall that a value should be considered an outlier if it is unusually low in frequency and greater than 1.5 times the inter-quartile range from the median. In this case, than would mean any number more than 7.5 above the 3 ^{ rd } quartile, 7, or below the 1 ^{ st } quartile, 2. That would make any value less than -5.5 or greater than 14.5 be considered a mild outlier. There are no negative values and no values greater than 9, so there are no outliers .
- The minimum and maximum values are the least and greatest values, respectively. Since we have organized our data in ascending order, the minimum value is on the far left, “1”, and the maximum value is on the far right, “9”. The minimum is 1 and the maximum is 9.
Example B
Identify the five statistical summary and any outliers in the data below:
18, 16, 18, 17, 15, 2, 17, 20, 19, 18, 15, 16, 28, 18
Solution: Begin by ordering the data numerically:
2, 15, 15, 16, 16, 17, 17, 18, 18, 18, 18, 19, 20, 28
- Median: There are 14 values, an even number, so the median is the average (arithmetic mean) of the two middle numbers, 17 and 18. 17.5 is the median .
- 1 ^{ st } and 3 ^{ rd } quartiles: The middle number in the lower 50% is 16, and the middle of the upper 50% is 18. 16 is the lower quartile and 18 is the upper quartile.
- The inter-quartile range is . Any value less than 13 or greater than 21 may be considered a mild outlier, and any value less than 10 or greater than 24 may be considered an extreme outlier. 2 and 28 are both extreme outliers.
- The least value is 2 and the greatest value is 28. 2 is the minimum and 28 is the maximum.
Example C
Create box plots representing the data from examples A and B.
Solution:
A. The data from example A was encapsulated in the five number summary:
Median: 5 1 ^{ st } quartile: 2 3 ^{ rd } quartile: 7 Minimum: 1 Maximum: 9
Draw a number line running from 0 to 10, and plot the five number summary above it:
Draw a rectangle including the first and third quartiles, and a vertical line for the median. Since there are no outliers, draw a “whisker” from each side of the box to the extremes:
B. The data from example B includes:
Median: 17.5
1 ^{ st } quartile: 16 3 ^{ rd } quartile: 18
Minimum: 2 Maximum: 28
Outliers (extreme): 2, 28
Draw a number line running from 0 – 30, and plot the five number summary:
Note that since “2” and “28” are both extreme outliers, the box-and-whiskers only extend to the greatest and least non-extreme values. This is sometimes called a modified boxplot.
Concept Problem Revisited
If you were asked to create a visual representation of the mean, upper and lower 25% (quartiles), and maximum and minimum (extremes) scores on the final test in your College Algebra class, how would you go about it? Would a box-and-whisker plot be appropriate? Why or why not? What would the plot look like if the mean was 82%, the lowest score was 59%, highest was 96%, and if a quarter of the class scored above 86% while another quarter scored below 70%?
This would be an excellent application for a box plot. In fact, this is just about the best use of one. You will find, if you haven’t already, that the SAT and ACT college application exams report grades in just his manner. Colleges (and students themselves) inevitably wish to see how a particular score compares to others on the same test, and a box plot is ideal for that purpose.
If the data in the question were plotted as a box plot, it would appear like this:
Vocabulary
The 5 number summary (or 5 statistic summary) is the collective term used to describe the minimum and maximum, middle, and 25% and 75% values in a data set.
The extremes are the minimum and maximum values in a set of data.
The median is the middle value in a set of data, when the data is organized in numerical order.
The values halfway between each extreme and the median are called the quartiles.
The arithmetic mean is a measure of central tendency calculated by finding the sum of the data, divided by the number of data entries. This value is referred to as the average in common language.
The inter-quartile range is the range of values between the first and third quartiles – representing the middle half of your data. In other words: the , and 50% of the data is in the box.
Outliers are values uncommonly distant from the mean. Mild outliers are determined as values at least 1.5 times the inter-quartile range above or below the 3 ^{ rd } or 1 ^{ st } quartiles. Extreme outliers are values greater than 3 times the inter-quartile range from the upper or lower quartiles.
Guided Practice
1. The box-and-whisker plot below shows the starting salaries for graduates of a small college. What is the range of the starting salaries?
2. Mr. Andrews made a box-and-whisker graph of the quiz grades in his chemistry class. What is the median quiz grade for the class?
3. Mr. Foreman grades on a curve in which the top 25% of the test scores earn A’s, the middle 50% earn C’s, and the bottom 25% earn F’s. The box and whisker plot below shows the distribution of scores on the last test. What is the range of scores for people who earned C's?
Solutions:
1. The range is the difference between the maximum and minimum values, that is:
2. The median is denoted by a line in the center of a boxplot, in this case, that would be 77%
3. If the middle 50% of Mr. Foreman’s class earns C’s, then all of the scores in the interquartile range, the area between Q1 and Q3, would be included. Since the “box” of a boxplot indicates the IQR, that would be 65% - 80% .
Practice
Use the following boxplot to answer questions 1 – 5 below:
- What is the median
- What is the lower quartile
- What is the upper quartile
- What is the minimum value
- What is the maximum value
- What are the five values called?
- What is the range of the data?
- What percentage of the data is below the upper quartile?
- What percentage of data is located between the lower quartile and the median?
- What percentage of data is above the median?
- What percentage of data is below the lower quartile?
- Calculate the Range for the following data: 5, 21, 10, 9, 12, 12, 16, 16, 9, 6, 20, 8, 10, 26, 4, 26, and 14.
- Calculate the First Quartile for the following data: 5, 21, 10, 9, 12, 14, 13, 16, 9, 6, 20, 8, 12, 24, 4, 26, and 14
- State the five number summary of the following data set: 13, 14, 10, 4, 18, 17, 11, 10, 5, 7, 10 19, 13
- Construct a box and whisker plot for the data set given in question 12