7.8: Histograms
Suppose you're working in a card store and you notice that the cards have different prices. You pull 15 cards out of inventory and their prices are 0.75, 0.95, 1.25, 1.65, 1.75, 2.25, 0.95, 1.10, 3.55, 5.00, 1.35, 2.25, 3.75, 4.25, and 5.65. If you divide this data up into 0.50 intervals how can you construct a graph of your findings?
Watch This
First watch this video to learn about histograms.
CK-12 Foundation: Chapter7HistogramsA
Then watch this video to see some examples.
CK-12 Foundation: Chapter7HistogramsB
Watch this video for more help.
Guidance
An extension of the bar graph is the histogram. A histogram is a type of vertical bar graph in which the bars represent grouped continuous data. The shape of a histogram can tell you a lot about the distribution of the data, as well as provide you with information about the mean, median, and mode of the data set. The following are some typical histograms, with a caption below each one explaining the distribution of the data, as well as the characteristics of the mean, median, and mode. Distributions can have other shapes besides the ones shown below, but these represent the most common ones that you will see when analyzing data. In each of the graphs below, the distributions are not perfectly shaped, but are shaped enough to identify an overall pattern.
a)
Figure a represents a bell-shaped distribution, which has a single peak and tapers off to both the left and to the right of the peak. The shape appears to be symmetric about the center of the histogram. The single peak indicates that the distribution is unimodal. The highest peak of the histogram represents the location of the mode of the data set. The mode is the data value that occurs the most often in a data set. For a symmetric histogram, the values of the mean, median, and mode are all the same and are all located at the center of the distribution.
b)
Figure b represents a distribution that is approximately uniform and forms a rectangular, flat shape. The frequency of each class is approximately the same.
c)
Figure c represents a right-skewed distribution, which has a peak to the left of the distribution and data values that taper off to the right. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the right, the mean is located to the right on the distribution and is the largest value of the measures of central tendency. The mean has the largest value because it is strongly affected by the outliers on the right tail that pull the mean to the right. The mode is the smallest value, and it is located to the left on the distribution. The mode always occurs at the highest point of the peak. The median is located between the mode and the mean.
d)
Figure d represents a left-skewed distribution, which has a peak to the right of the distribution and data values that taper off to the left. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the left, the mean is located to the left on the distribution and is the smallest value of the measures of central tendency. The mean has the smallest value because it is strongly affected by the outliers on the left tail that pull the mean to the left. The median is located between the mode and the mean.
e)
Figure e has no shape that can be defined. The only defining characteristic about this distribution is that it has 2 peaks of the same height. This means that the distribution is bimodal.
While there are similarities between a bar graph and a histogram, such as each bar being the same width, a histogram has no spaces between the bars. The quantitative data is grouped according to a determined bin size, or interval. The bin size refers to the width of each bar, and the data is placed in the appropriate bin.
The bins, or groups of data, are plotted on the \begin{align*}x\end{align*}
The first bin includes the values 0 through 9, and the next bin includes the values 9 through 19. This makes the bins the proper size. Bin sizes are written in this manner to simplify the process of grouping the data. The first bin can begin with the smallest number of the data set and end with the value determined by adding the bin width to this value, or the bin can begin with a reasonable value that is smaller than the smallest data value.
Example A
Construct a frequency distribution table with a bin size of 10 for the following data, which represents the ages of 30 lottery winners:
\begin{align*}& 38 \quad 41 \quad 29 \quad 33 \quad 40 \quad 74 \quad 66 \quad 45 \quad 60 \quad 55\\
& 25 \quad 52 \quad 54 \quad 61 \quad 46 \quad 51 \quad 59 \quad 57 \quad 66 \quad 62\\
& 32 \quad 47 \quad 65 \quad 50 \quad 39 \quad 22 \quad 35 \quad 72 \quad 77 \quad 49\end{align*}
Step 1: Determine the range of the data by subtracting the smallest value from the largest value.
\begin{align*}\text{Range:} \ 77-22=55\end{align*}
Step 2: Divide the range by the bin size to ensure that you have at least 5 groups of data. A histogram should have from 5 to 10 bins to make it meaningful: \begin{align*}\frac{55}{10}=5.5 \approx 6\end{align*}
Step 3: Construct the table.
Bin | Frequency |
---|---|
\begin{align*}[20-30)\end{align*} |
3 |
\begin{align*}[30-40)\end{align*} |
5 |
\begin{align*}[40-50)\end{align*} |
6 |
\begin{align*}[50-60)\end{align*} |
8 |
\begin{align*}[60-70)\end{align*} |
5 |
\begin{align*}[70-80)\end{align*} |
3 |
Step 4: Determine the sum of the frequency column to ensure that all the data has been grouped.
\begin{align*}3+5+6+8+5+3=30\end{align*}
When data is grouped in a frequency distribution table, the actual data values are lost. The table indicates how many values are in each group, but it doesn't show the actual values.
There are many different ways to create a distribution table and many different distribution tables that can be created. However, for the purpose of constructing a histogram, the method shown works very well, and it is not difficult to complete.
Example B
The numbers of years of service for 75 teachers in a small town are listed below:
\begin{align*}& 1, \ 6, \ 11, \ 26, \ 21, \ 18, \ 2, \ 5, \ 27, \ 33, \ 7, \ 15, \ 22, \ 30, \ 8\\
& 31, \ 5, \ 25, \ 20, \ 19, \ 4, \ 9, \ 19, \ 34, \ 3, \ 16, \ 23, \ 31, \ 10, \ 4\\
& 2, \ 31, \ 26, \ 19, \ 3, \ 12, \ 14, \ 28, \ 32, \ 1, \ 17, \ 24, \ 34, \ 16, \ 1,\\
& 18, \ 29, \ 10, \ 12, \ 30, \ 13, \ 7, \ 8, \ 27, \ 3, \ 11, \ 26, \ 33, \ 29, \ 20\\
& 7, \ 21, \ 11, \ 19, \ 35, \ 16, \ 5, \ 2, \ 19, \ 24, \ 13, \ 14, \ 28, \ 10, \ 31\end{align*}
Using the above data, construct a frequency distribution table with a bin size of 5.
\begin{align*}\text{Range:} \ 35-1 & = 34\\
\frac{34}{5} & = 6.8 \approx 7\end{align*}
You will have 7 bins.
When the number of data values is very large, another column is often inserted in the distribution table. This column is a tally column, and it is used to account for the number of values within a bin. A tally column facilitates the creation of the distribution table and usually allows the task to be completed more quickly. For each value that is in a bin, draw a stroke in the Tally column. To make counting the strokes easier, draw 4 strokes and cross them out with the fifth stroke. This process bundles the strokes in groups of 5, and the frequency can be readily determined.
Bin | Tally | Frequency |
---|---|---|
\begin{align*}[0-5)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||} \ |\end{align*} |
11 |
\begin{align*}[5-10)\end{align*} |
\begin{align*}\cancel{||||} \ ||||\end{align*} |
9 |
\begin{align*}[10-15)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||} \ ||\end{align*} |
12 |
\begin{align*}[15-20)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||} \ ||||\end{align*} |
14 |
\begin{align*}[20-25)\end{align*} |
\begin{align*}\cancel{||||} \ ||\end{align*} |
7 |
\begin{align*}[25-30)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||}\end{align*} |
10 |
\begin{align*}[30-35)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||} \ ||\end{align*} |
12 |
\begin{align*}11+9+12+14+7+10+12 = 75\end{align*}
Now that you have constructed the frequency table, the grouped data can be used to draw a histogram. Like a bar graph, a histogram requires a title and properly labeled \begin{align*}x\end{align*}
Example C
Use the data from Example A that displays the ages of the lottery winners to construct a histogram. The data is shown again below. What percentage of the winners were 50 years of age or older?
Bin | Frequency |
---|---|
\begin{align*}[20-30)\end{align*} |
3 |
\begin{align*}[30-40)\end{align*} |
5 |
\begin{align*}[40-50)\end{align*} |
6 |
\begin{align*}[50-60)\end{align*} |
8 |
\begin{align*}[60-70)\end{align*} |
5 |
\begin{align*}[70-80)\end{align*} |
3 |
Use the data as it is represented in the distribution table to construct the histogram.
From looking at the tops of the bars, you can see how many winners were in each category, and by adding these numbers, you can determine the total number of winners. You can also determine how many winners were within a specific category. For example, you can see that 8 winners were 60 years of age or older. The graph can also be used to determine percentages. For example, it can answer the question, “What percentage of the winners were 50 years of age or older?” as follows:
\begin{align*}\frac{16}{30} = 0.5\overline{33} \qquad (0.533) (100\%) \approx 5.3\%.\end{align*}
Vocabulary
A frequency distribution is a table that lists all of the classes and the number of data values that belong to each of the classes. A distribution in which most of the data values are located to the right of the mean is called a left-skewed distribution, while a distribution in which most of the data values are located to the left of the mean is called a right-skewed distribution.
A histogram is a graph in which the quantitative or qualitative categories, or bins, are on the horizontal axis, and the frequencies are plotted on the vertical axis. Bins are also known as classes. The frequencies on a histogram are represented by vertical bars that are drawn adjacent to each other. A symmetric histogram is a histogram for which the values of the mean, median, and mode are all the same and are all located at the center of the distribution.
Guided Practice
a. Use the data and the distribution table that represent the ages of teachers from Example B to construct a histogram to display the data. The distribution table is shown again below:
Bin | Tally | Frequency |
---|---|---|
\begin{align*}[0-5)\end{align*} |
\begin{align*}\cancel{||||} \ \cancel{||||} \ |\end{align*} |
11 |
\begin{align*}[5-10)\end{align*} | \begin{align*}\cancel{||||} \ ||||\end{align*} | 9 |
\begin{align*}[10-15)\end{align*} | \begin{align*}\cancel{||||} \ \cancel{||||} \ ||\end{align*} | 12 |
\begin{align*}[15-20)\end{align*} | \begin{align*}\cancel{||||} \ \cancel{||||} \ ||||\end{align*} | 14 |
\begin{align*}[20-25)\end{align*} | \begin{align*}\cancel{||||} \ ||\end{align*} | 7 |
\begin{align*}[25-30)\end{align*} | \begin{align*}\cancel{||||} \ \cancel{||||}\end{align*} | 10 |
\begin{align*}[30-35)\end{align*} | \begin{align*}\cancel{||||} \ \cancel{||||} \ ||\end{align*} | 12 |
b. Now use the histogram to answer the following questions.
i. How many teachers teach in this small town?
ii. How many teachers have worked for less than 5 years?
iii. If teachers are able to retire when they have taught for 30 years or more, how many are eligible to retire?
iv. What percentage of the teachers still have to teach for 10 years or fewer before they are eligible to retire?
v. Do you think that the majority of the teachers are young or old? Justify your answer.
Answer:
a.
b. i. \begin{align*}11+9+12+14+7+10+12=75\end{align*}
In this small town, 75 teachers are teaching.
ii. 11 teachers have taught for less than 5 years.
iii. 12 teachers are eligible to retire.
iv. \begin{align*}\frac{17}{75}=0.22\overline{66} \qquad (0.2266)(100\%) \approx 23\%\end{align*}
Approximately 23% of the teachers must teach for 10 years or fewer before they are eligible to retire.
v. Answers will vary, but one possible answer is that the majority of the teachers are young, because 46 have taught for less than 20 years.
Interactive Practice
Practice
- What name is given to a distribution that has 2 peaks of the same height?
- uniform
- unimodal
- bimodal
- discrete
The following histogram shows data collected during a recent fishing derby. The number of fish caught is being compared to the size of the fish caught. Use the histogram to answer the following questions:
- How many fish were caught?
- How many fish caught were over 35 cm in length?
- How many fish caught were between 20 cm and 29 cm in length?
- Why is there a blank space between 38 cm and 41 cm on the histogram?
The following histogram displays the heights of students in a classroom. Use the information represented in the histogram to answer the following questions:
- How many students are in the class?
- How many students are over 60 inches in height?
- How many students have a height between 54 in and 62 in?
- Is the distribution unimodal or bimodal? How do you know?
- The following data represents the results of a test taken by a group of students: \begin{align*}& 95 \quad 56 \quad 70 \quad 83 \quad 59 \quad 66 \quad 88 \quad 52 \quad 50 \quad 77 \quad 69 \quad 80\\ & 54 \quad 75 \quad 68 \quad 78 \quad 51 \quad 64 \quad 55 \quad 67 \quad 74 \quad 57 \quad 73 \quad 53\end{align*} Construct a frequency distribution table using a bin size of 10 and display the results in a properly labeled histogram.
frequency distribution
A table that lists all of the classes and the number of data values that belong to each of the classes. A distribution in which most of the data values are located to the right of the mean is called a left-skewed distribution, while a distribution in which most of the data values are located to the left of the mean is called a right-skewed distribution.bar graph
A bar graph is a plot made of bars whose heights (vertical bars) or lengths (horizontal bars) represent the frequencies of each category, with space between each bar.frequency density
The vertical axis of a histogram is labelled frequency density.Frequency table
A frequency table is a table that summarizes a data set by stating the number of times each value occurs within the data set.Histogram
A histogram is a display that indicates the frequency of specified ranges of continuous data values on a graph in the form of immediately adjacent bars.Interval
An interval is a range of data in a data set.Range
The range of a data set is the difference between the smallest value and the greatest value in the data set.right-skewed distribution
A right-skewed distribution has a peak to the left of the distribution and data values that taper off to the right.unimodal
If a data set has only 1 value that occurs most often, the set is called unimodal.Image Attributions
Here you'll learn how to create a frequency distribution chart and how to construct and interpret a histogram.