Understand the importance of a stem-and-leaf plot in statistics.
Construct and interpret a bar graph.
Create a frequency distribution chart.
Construct and interpret a histogram.
Use technology to create graphical representations of data.
Suppose you have a younger sister or brother and it is your job to entertain him or her every Saturday morning. You decide to take the youngster to the community pool to swim. Since swimming is a new thing to do, your little buddy isn’t too sure about the water and is a bit scared of the new adventure. You decide to keep a record of the length of time they stay in the water each morning. You recorded the following times (in minutes):
Your brother or sister is too young to understand the meaning of the times that you’ve recorded so you decide that you have to draw a picture of these numbers to show to the child. How are you going to represent these numbers?
By the end of this lesson you will have several ideas of how to represent these numbers and you can choose the one that you think your little buddy will understand the best.
A bar chart or bar graph is often used for data that can be described by categories (months, colors, activities. . . ) which is referred to as qualitative data. A bar graph can also be used to represent numerical data (quantitative data) if the number of data is not too large. A bar graph plots the number of times a category or value occurs in the data set. The height of the bar represents the number of times the value or the observation appeared in the data set. The y−axis most often records the frequency and the x−axis records the category or value interval. The axes must be labeled to indicate what each one represents and a title should be placed on the graph. When a bar graph is used to display qualitative data, the data is grouped in bins or intervals. These bins and the frequency of the data that is located in each bin can be shown in a frequency distribution table. For a bar graph, there is a break between the bins because the data is not continuous. The bins for a set of data could be grouped with a bin size of 10 and be written as 10−19,20−29 and 30−31.
Example 1: Sara is doing a project on winter weather for her Science project. She has decided to research the amount of snowfall (in inches) that fell last year for cities in Canada. Here is the information that she has collected:
She is going to represent this qualitative data in a bar graph.
Sara has created a very colorful bar graph which includes a title, the category (City) on the x−axis and the frequency (Snowfall in.) on the y−axis. There is an equal space between each of the bars and each of the bars is the same width.
Example 2: The School Board for your district has to submit a report to the state that tells what percent of their casual employees work in the transportation department and the ages of these employees. The Board decides to create a frequency distribution table and then to display this information on a quantitative bar graph.
Bin (Age in yr.)
This bar graph contains the information that the Board wanted to send to the state but the actual data has been lost. The ages of the employees have been put into bins that have groups of ages. As a result, you know that 22% of the employees are between the ages of 20 to 29 but you do not know the age of the employees. It is possible that 3 people are 20, 2 people are 25 and 3 people are 28. There are numerous combinations that could belong in this age group but that is something that you do not know from this graph. The only information that can be learned from this graph is the percentage of the employees that fit in each age group.
Bar graphs, whether they display qualitative or quantitative data can be extended to double bar graphs. Graphs of this nature are used for comparison of data.
Example 3: The new manager of the school cafeteria decided to ask students to choose a favorite food from the following list:
Once the students had made their decisions he created a double bar graph to compare the choices of boys and girls. The following graph shows the results:
The graph compares the preferences in food of the girls with those of the boys.
A histogram is very similar to a bar graph with no spaces between the bars. The bars are all along side each other. The groups of data or bins are plotted on the x−axis and their frequencies are on the y−axis. In most cases, the bins are designed so that there is no break in the groups. This means that if you had a set of data grouped in bin sizes of ten and the data ranged from zero to fifty, the bins would be represented as [0−10);[10−20);[20−30);[30−40);[40−50)and[50−60). If you count the number of numbers in each bin, you see that it is 11. You are supposed to have a bin size of 10. The notation [,) means that the first number in each bin is after the square bracket [but the last number) actually counts in the next group. Although the bins are written in this manner, the bin really extends 0 to 9, 10 to 19 etc. when the data is grouped. Histograms are usually drawn with the data from a frequency distribution table – often called a frequency table. Like a bar graph, a histogram requires a title and properly labeled x and y axes.
Example 1: Studies (and logic) show that the more homework you do the better your grade in a course. In a study conducted at a local school, students in grade 10 were asked to check off what box represented the average amount of time they spent on homework each night. The following results were recorded:
Time Spent on Homework (Hours)
Frequency (# of students)
This data will now be represented by drawing a histogram.
As with the bar graph, the actual data values are not plotted because the data has been grouped in bins.
An extension of the histogram is a frequency polygon graph. A frequency polygon simply joins the midpoints (the center of the tops of the bars) of the histogram class intervals with straight lines and then extends these to the horizontal axis. The distribution is extended one unit before the smallest recorded data and one unit beyond the largest recorded data. Looking at the histogram below, we can draw the frequency polygon on top of the histogram. The area under the frequency polygon is the same as the area under the histogram and is therefore equal to the frequency values in the table. The frequency polygon also the shape of the distribution of the data and in this case it resembles the bell curve.
A stem and leaf plot is an organization of numerical data into categories based on place value. The stem-and-leaf plot is a graph that is similar to a histogram but it displays more information. Also, the data values are kept in a stem-and-leaf plot and are used to describe the shape of the distribution of the data. . For a stem-and-leaf plot, each number will be divided into two parts using place value. The stem is the left-hand column and will contain the digits in the largest place. The right-hand column will be the leaf and it will contain the digits in the smallest place. For example the number 65 would be separated such that the 6 would be the stem (tens place) and 5 would be the leaf (digits place).
Example 1: In a recent study of male students at a local high school, students were asked how much money they spend socially on Prom night. The following numbers represent the amount of dollars of a random selection of 40 students.
The above data values are not arranged in any order. For purposes of observing and analyzing data, the values can be distributed into smaller groups using a stem-and-leaf plot. The stems will be arranged vertically in ascending order (smallest to largest) and each leaf will be written to the right of its stem horizontally in order from least to greatest.
Dollars Spent by Males on Prom Night
4, 5, 5, 5
0, 0, 2, 7, 9
0, 0, 0, 0, 5, 5, 8
0, 0, 4, 5
0, 0, 0, 5, 5
0, 0, 0, 4, 5
0, 0, 3, 5, 5
The stem-and-leaf plot can be interpreted very easily. By very quickly looking at stem 6, you see that 4 males spent 60 ‘some dollars’ on Prom night. By counting the number of leaves, you know that 40 males responded to the question concerning how much money they spent on prom night. The smallest and largest data values are known by looking and the first and last stem-and-leaf. The stem-and-leaf is ‘quick look’ chart that can quickly provide information from the data. This also serves as an easy method for sorting numbers manually.
Example 2: The women from the senior citizen’s complex bowl everyday of the month. Lizzie had never bowled before and was enjoying this new found pastime. She decided to keep track of her best score of the day for the month of September. Here are the scores that she recorded:
Undefined control sequence \
In order for Lizzie to see how well she is doing, create a stem-and-leaf plot of her scores.
Lizzie’s Bowling Scores
1, 1, 2, 2, 4, 5, 5, 5, 7, 8, 9
0, 0, 1, 3, 6, 7, 7, 9, 9
0, 0, 0, 2, 2, 3, 7
Let’s return to the problem that was posed at the beginning of the lesson. You are supposed to display the amount of time your young brother or sister stayed in the water each time you went swimming. Let’s look at some options.
Minutes in Water
3, 4, 5, 7,
0, 0, 1
Frequency Distribution Table
Minutes in Water
In this lesson you learned how to display data that was both qualitative and quantitative. You created bar graphs that were both single and double. The double bar graphs are very good for comparing two sets of data quickly. The histogram was another way of representing data. It is similar to a bar graph – without the spaces. You also learned that both of these graphs lose the actual data when they are plotted. The data itself remains in bins or categories. Using a stem-and-leaf plot allows the actual data to be saved and it is really an ‘at a glance’ graph. Although it is quicker and less time consuming to manually create a stem-and-leaf than it is a bar graph or a histogram, the appearance of the latter two graphs is much more appealing to the eye.
Points to Consider:
Is there any other way to display data that is useful when comparing the values of two data sets?
Other than sorting the data into categories or bins, there were no mathematical calculations that had to be done to create these graphs. Are calculations necessary to represent data on another type of graph?
For the following graph answer the questions below:
What is displayed on the vertical axis?
What scale is used on the vertical axis?
What is displayed on the horizontal axis?
Which city had the least amount of snow in 2008?
Which city had the most snow in 2008?
Which two cities showed little difference in the amount of snow they received?
Do some research in your area and create a bar graph similar to that in question one, concerning weather for cities in your country.
For the following graph, answer the questions below.
What is the total percent of people that work in the transportation department?
Why do you think this total is not 100%?
Which age group has the most people that work in the transportation department?
Which age group has the fewest number of people who work in the transportation department?
For each of the following examples, describe why you would likely use a bar graph or a histogram.
Frequency of the favorite drinks for the first 100 people to enter the school dance.
Frequency of the average time it takes the people in your class to finish a math assignment.
Frequency of the average distance people park their cars away from the mall in order to walk a little more.
Prepare a histogram using the following scores from a recent science test. When done, use a different colour pencil and draw a frequency polygon on your graph. Does the area under your frequency polygon look equal to the area colored in your histogram?
A research firm has just developed a streak-free glass cleaner. The product is sold at a number of local chain stores and its sales are being closely monitored. At the end of one year, the sales of the product are released. The company is planning on starting up an Ad Campaign to promote the product. The data is found in the chart below. 266942041642191638724813719314489175164118248159123220141122143250168100217165226138131 Display the sales of the product before the Ad campaign in a stem-and-leaf plot.
Answer the following questions with respect to the above stem-and-leaf plot.
How many chain stores were involved in selling the streak-free glass cleaner?
In stem 1, what does the number 11 represent? What does the number 8 represent?
What percentage of stores sold less than 175 bottles of streak-free glass cleaner?
The snowfall amount in inches.
The scale is each block = 20 inches.
The name of the city.
Edmonton and Toronto
Answers will vary
Some casual workers work in other departments
The responses for the question “What is your favorite beverage?” would be specific names. There is no range in the data. Therefore a bar graph would be used. The beverage would be on the x−axis and the number of students would be on the y−axis. A Bar Graph would be used.
The results would have to be grouped in intervals since each result represents a specific time. The time intervals would be on the x−axis and the number of students would be on the y−axis. A Histogram would be used.
Once again a histogram would be used since the results would have to be grouped in intervals since each result represents a specific distance. The distance intervals would be on the x−axis and the number of people would be on the y−axis.
The area under the frequency polygon appears to be equal to the area of the histogram.
1, 7, 8
1, 3, 4
3, 4, 4, 5, 8
(a) 30 stores
(b) 118 bottles of streak free cleaner sold by 1 store
Answer Key for Review Questions (even numbers)
2. Answers will vary
4. a. The responses for the question “What is your favorite beverage?” would be specific names. There is no range in the data. Therefore a bar graph would be used. The beverage would be on the x−axis and the number of students would be on the y−axis. A Bar Graph would be used.
b. The results would have to be grouped in intervals since each result represents a specific time. The time intervals would be on the x−axis and the number of students would be on the y−axis. A Histogram would be used.
c. Once again a histogram would be used since the results would have to be grouped in intervals since each result represents a specific distance. The distance intervals would be on the x−axis and the number of people would be on the y−axis.