7.8: Histograms
Suppose you're working in a card store and you notice that the cards have different prices. You pull 15 cards out of inventory and their prices are 0.75, 0.95, 1.25, 1.65, 1.75, 2.25, 0.95, 1.10, 3.55, 5.00, 1.35, 2.25, 3.75, 4.25, and 5.65. If you divide this data up into 0.50 intervals how can you construct a graph of your findings?
Watch This
First watch this video to learn about histograms.
CK12 Foundation: Chapter7HistogramsA
Then watch this video to see some examples.
CK12 Foundation: Chapter7HistogramsB
Watch this video for more help.
Guidance
An extension of the bar graph is the histogram. A histogram is a type of vertical bar graph in which the bars represent grouped continuous data. The shape of a histogram can tell you a lot about the distribution of the data, as well as provide you with information about the mean, median, and mode of the data set. The following are some typical histograms, with a caption below each one explaining the distribution of the data, as well as the characteristics of the mean, median, and mode. Distributions can have other shapes besides the ones shown below, but these represent the most common ones that you will see when analyzing data. In each of the graphs below, the distributions are not perfectly shaped, but are shaped enough to identify an overall pattern.
a)
Figure a represents a bellshaped distribution, which has a single peak and tapers off to both the left and to the right of the peak. The shape appears to be symmetric about the center of the histogram. The single peak indicates that the distribution is unimodal. The highest peak of the histogram represents the location of the mode of the data set. The mode is the data value that occurs the most often in a data set. For a symmetric histogram , the values of the mean, median, and mode are all the same and are all located at the center of the distribution.
b)
Figure b represents a distribution that is approximately uniform and forms a rectangular, flat shape. The frequency of each class is approximately the same.
c)
Figure c represents a rightskewed distribution , which has a peak to the left of the distribution and data values that taper off to the right. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the right, the mean is located to the right on the distribution and is the largest value of the measures of central tendency. The mean has the largest value because it is strongly affected by the outliers on the right tail that pull the mean to the right. The mode is the smallest value, and it is located to the left on the distribution. The mode always occurs at the highest point of the peak. The median is located between the mode and the mean.
d)
Figure d represents a leftskewed distribution , which has a peak to the right of the distribution and data values that taper off to the left. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the left, the mean is located to the left on the distribution and is the smallest value of the measures of central tendency. The mean has the smallest value because it is strongly affected by the outliers on the left tail that pull the mean to the left. The median is located between the mode and the mean.
e)
Figure e has no shape that can be defined. The only defining characteristic about this distribution is that it has 2 peaks of the same height. This means that the distribution is bimodal.
While there are similarities between a bar graph and a histogram, such as each bar being the same width, a histogram has no spaces between the bars. The quantitative data is grouped according to a determined bin size, or interval. The bin size refers to the width of each bar, and the data is placed in the appropriate bin.
The bins , or groups of data, are plotted on the axis, and the frequencies of the bins are plotted on the axis. A grouped frequency distribution is constructed for the numerical data, and this table is used to create the histogram. In most cases, the grouped frequency distribution is designed so there are no breaks in the intervals. The last value of one bin is actually the first value counted in the next bin. This means that if you had groups of data with a bin size of 10, the bins would be represented by the notation [010), [1020), [2030), etc. Each bin appears to contain 11 values, which is 1 more than the desired bin size of 10. Therefore, the last digit of each bin is counted as the first digit of the following bin.
The first bin includes the values 0 through 9, and the next bin includes the values 9 through 19. This makes the bins the proper size. Bin sizes are written in this manner to simplify the process of grouping the data. The first bin can begin with the smallest number of the data set and end with the value determined by adding the bin width to this value, or the bin can begin with a reasonable value that is smaller than the smallest data value.
Example A
Construct a frequency distribution table with a bin size of 10 for the following data, which represents the ages of 30 lottery winners:
Step 1: Determine the range of the data by subtracting the smallest value from the largest value.
Step 2: Divide the range by the bin size to ensure that you have at least 5 groups of data. A histogram should have from 5 to 10 bins to make it meaningful: . Since you cannot have 0.5 of a bin, the result indicates that you will have at least 6 bins.
Step 3: Construct the table.
Bin  Frequency 

3  
5  
6  
8  
5  
3 
Step 4: Determine the sum of the frequency column to ensure that all the data has been grouped.
When data is grouped in a frequency distribution table, the actual data values are lost. The table indicates how many values are in each group, but it doesn't show the actual values.
There are many different ways to create a distribution table and many different distribution tables that can be created. However, for the purpose of constructing a histogram, the method shown works very well, and it is not difficult to complete.
Example B
The numbers of years of service for 75 teachers in a small town are listed below:
Using the above data, construct a frequency distribution table with a bin size of 5.
You will have 7 bins.
When the number of data values is very large, another column is often inserted in the distribution table. This column is a tally column, and it is used to account for the number of values within a bin. A tally column facilitates the creation of the distribution table and usually allows the task to be completed more quickly. For each value that is in a bin, draw a stroke in the Tally column. To make counting the strokes easier, draw 4 strokes and cross them out with the fifth stroke. This process bundles the strokes in groups of 5, and the frequency can be readily determined.
Bin  Tally  Frequency 

11  
9  
12  
14  
7  
10  
12 
Now that you have constructed the frequency table, the grouped data can be used to draw a histogram. Like a bar graph, a histogram requires a title and properly labeled  and axes.
Example C
Use the data from Example A that displays the ages of the lottery winners to construct a histogram. The data is shown again below. What percentage of the winners were 50 years of age or older?
Bin  Frequency 

3  
5  
6  
8  
5  
3 
Use the data as it is represented in the distribution table to construct the histogram.
From looking at the tops of the bars, you can see how many winners were in each category, and by adding these numbers, you can determine the total number of winners. You can also determine how many winners were within a specific category. For example, you can see that 8 winners were 60 years of age or older. The graph can also be used to determine percentages. For example, it can answer the question, “What percentage of the winners were 50 years of age or older?” as follows:
Vocabulary
A frequency distribution is a table that lists all of the classes and the number of data values that belong to each of the classes. A distribution in which most of the data values are located to the right of the mean is called a leftskewed distribution , while a distribution in which most of the data values are located to the left of the mean is called a rightskewed distribution .
A histogram is a graph in which the quantitative or qualitative categories, or bins , are on the horizontal axis, and the frequencies are plotted on the vertical axis. Bins are also known as classes. The frequencies on a histogram are represented by vertical bars that are drawn adjacent to each other. A symmetric histogram is a histogram for which the values of the mean, median, and mode are all the same and are all located at the center of the distribution.
Guided Practice
a. Use the data and the distribution table that represent the ages of teachers from Example B to construct a histogram to display the data. The distribution table is shown again below:
Bin  Tally  Frequency 

11  
9  
12  
14  
7  
10  
12 
b. Now use the histogram to answer the following questions.
i. How many teachers teach in this small town?
ii. How many teachers have worked for less than 5 years?
iii. If teachers are able to retire when they have taught for 30 years or more, how many are eligible to retire?
iv. What percentage of the teachers still have to teach for 10 years or fewer before they are eligible to retire?
v. Do you think that the majority of the teachers are young or old? Justify your answer.
Answer:
a.
b. i.
In this small town, 75 teachers are teaching.
ii. 11 teachers have taught for less than 5 years.
iii. 12 teachers are eligible to retire.
iv.
Approximately 23% of the teachers must teach for 10 years or fewer before they are eligible to retire.
v. Answers will vary, but one possible answer is that the majority of the teachers are young, because 46 have taught for less than 20 years.
Interactive Practice
Practice

What name is given to a distribution that has 2 peaks of the same height?
 uniform
 unimodal
 bimodal
 discrete
The following histogram shows data collected during a recent fishing derby. The number of fish caught is being compared to the size of the fish caught. Use the histogram to answer the following questions:
 How many fish were caught?
 How many fish caught were over 35 cm in length?
 How many fish caught were between 20 cm and 29 cm in length?
 Why is there a blank space between 38 cm and 41 cm on the histogram?
The following histogram displays the heights of students in a classroom. Use the information represented in the histogram to answer the following questions:
 How many students are in the class?
 How many students are over 60 inches in height?
 How many students have a height between 54 in and 62 in?
 Is the distribution unimodal or bimodal? How do you know?
 The following data represents the results of a test taken by a group of students: Construct a frequency distribution table using a bin size of 10 and display the results in a properly labeled histogram.
frequency distribution
A table that lists all of the classes and the number of data values that belong to each of the classes. A distribution in which most of the data values are located to the right of the mean is called a leftskewed distribution, while a distribution in which most of the data values are located to the left of the mean is called a rightskewed distribution.bar chart
A bar chart is a graphic display of categorical variables that uses bars to represent the frequency of the count in each category.bar graph
A bar graph is a plot made of bars whose heights (vertical bars) or lengths (horizontal bars) represent the frequencies of each category, with space between each bar.bell curve
A normal distribution curve is also known as a bell curve.bell shaped
A bell shaped histogram is a histogram with a prominent ‘mound’ in the center and similar tapering to the left and right.binning
Binning involves separating your data separated into separate classes or categories.bins
Bins are groups of data plotted on the xaxis.class limits
Class limits are, collectively, the upper and lower limit of an interval.class mark
A class mark is the middle value, or average of the class limits.extreme outliers
Extreme outliers include points more than 3 times the middle half of your data. .frequency density
The vertical axis of a histogram is labelled frequency density.frequency distribution table
A frequency distribution table lists the data values, as well as the number of times each value appears in the data set.frequency polygon
A frequency polygon is a graph constructed by using lines to join the midpoints of each interval, or bin.Frequency table
A frequency table is a table that summarizes a data set by stating the number of times each value occurs within the data set.Histogram
A histogram is a display that indicates the frequency of specified ranges of continuous data values on a graph in the form of immediately adjacent bars.Interval
An interval is a range of data in a data set.leftskewed distribution
A leftskewed distribution has a peak to the right of the distribution and data values that taper off to the left.mild outliers
Mild outliers include data points that are more than 1.5 times the middle half of your data above the upper, or below the lower, quartiles.multimodal
When a set of data has more than 2 values that occur with the same greatest frequency, the set is called multimodal .normal distributed
If data is normally distributed, the data set creates a symmetric histogram that looks like a bell.Outlier
In statistics, an outlier is a data value that is far from other data values.Range
The range of a data set is the difference between the smallest value and the greatest value in the data set.relative cumulative frequency plot (ogive plot)
A relative cumulative frequency plot, or ogive plot, shows how the data accumulate across the different values of the variable.relative frequency histogram
A relative cumulative frequency histogram is a histogram except the vertical bars as the relative cumulative frequencies.rightskewed distribution
A rightskewed distribution has a peak to the left of the distribution and data values that taper off to the right.shape
The shape of a histogram can lead to valuable conclusions about the trend(s) of the data.skewed
As with the horizontal skewing of a histogram, stem plots with a obvious skew toward one end or the other tend to indicate an increased number of outliers either lesser than or greater than the mode.symmetric
In statistics, a distribution is considered symmetric if the data set that is moundshaped.symmetric histogram
For a symmetric histogram, the values of the mean, median, and mode are all the same and are all located at the center of the distribution.undefined bimodal
A undefined bimodal histogram has a shape is not specifically defined, but we can note regardless that it is bimodal, having two separated classes or intervals equally representing the maximum frequency of the distribution.uniform
A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to that of the others.unimodal
If a data set has only 1 value that occurs most often, the set is called unimodal.Image Attributions
Description
Learning Objectives
Here you'll learn how to create a frequency distribution chart and how to construct and interpret a histogram.