<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are reading an older version of this FlexBook® textbook: CK-12 Algebra I Go to the latest version.

# 11.7: Stem-and-Leaf Plots and Histograms

Difficulty Level: At Grade Created by: CK-12

## Learning Objectives

• Make and interpret stem-and-leaf plots.
• Make and interpret histograms.
• Make histograms using a graphing calculator.

## Introduction - Grouping and Visualizing Data

Imagine asking a class of 20 algebra students how many brothers and sisters they had. You would probably get a range of answers from zero on up. Some students would have no siblings, but most would have at least one. The results may look like this.

$1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6$

We could organize this many ways. The first way might just be to create an ordered list, relisting all numbers in order, starting with the smallest.

$0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 6$

Another way to list the results is in a table.

Number of Siblings Number of Matching Students
0 4
1 7
2 5
3 2
4 1
5 0
6 1

We could also make a visual representation of the data by making categories for the number of siblings on the $x-$axis, and stacking representations of each student above the category marker. We could use crosses, stick-men or even photographs of the students to show how many students are in each category.

## Make and Interpret Stem-and-Leaf Plots

Another useful way to display data is with a stem-and-leaf plot. Stem-and-leaf plots are especially useful because they give a visual representation of how the data is clustered, but preserves all of the numerical information. It consists of the stem, a vertical scale on the left that represents the first digit, and the leaf, the second digit placed to the right of the stem. In the stem-and-leaf plot below, the first number represented is 21. It is the only number with a stem of 2, so that makes it the only number in the 20's. The next two numbers have a common stem of 3. They are 33 and 36. The next numbers are 40, 45 and 47.

Stem-and-leaf plots have a number of advantages over simply listing the data in a single line.

• They show how data is distributed, and whether it is symmetric around the cente.
• They can be used as the data is being collected.
• They make it easy to determine the median and mode.

Stem-and-leaf plots are not ideal for all situations, in particular they are not practical when the data is too tightly clustered. For example with the student sibling, data all data points would occupy the same stem (zero). In that case, no additional information could be gained from a stem-and-leaf plot.

Example 1

While traveling on a long train journey, Rowena collected the ages of all the passengers traveling in her carriage. The ages for the passengers are shown below. Arrange the data into a stem-and-leaf plot, and use the plot to find the median and mode ages.

$& 35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2,\\& 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21$

Solution

The first step is to determine a sensible stem. Since all the values fall between 1 and 84, the stem should represent the tens column, and run from 0 to 8 so that the numbers represented can range from 00 (which we would represent by placing a leaf of 0 next to the 0 on the stem) to 89 (a leaf of 9 next to the 8 on the stem). We then go through the data and fill out our plot.

You can see immediately that the interval with the most number of passengers is the 40 - 49 group. In order to correctly determine the median and the mode, it is helpful to construct a second, ordered stem and leaf plot, placing the leaves on each branch in ascending order

The mode is now apparent - there are 4 zeros in a row on the 4-branch, so the $\text{mode}=40.$

To find the median, we will use the $\left ( \frac{n + 1}{2}\right)^{th}$ value that we used earlier. There are 43 data points, so $\left ( \frac{n + 1}{2}\right)=\frac{44}{2}=22$. Counting out the $22^{nd}$ value we find that the $\text{median}=37.$

## Make and Interpret Histograms

Look again at the example of the algebra students and their siblings. The data was collected in the following list.

$1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6$

We were able to organize the data into a table. Now, we will rewrite the table, but this time we will use the word frequency as a header to indicate the number of times each value occurs in the list.

Number of Siblings Frequency
0 4
1 7
2 5
3 2
4 1
5 0
6 1

Now we could use this table as an $(x, y)$ coordinate list to plot a line diagram, and such a diagram is shown right.

While this diagram does indeed show the data, it is somewhat misleading. For example, you might read that since the line joins the number of students with one and two siblings that we know something about how many students have 1.5 siblings (which of course, is impossible). In this case, where the data points are all integers, it is wrong to suggest that the function is continuous between the points!

When the data we are representing falls into well defined categories (such as the integers 1, 2, 3, 4, 5 & 6), it is more appropriate to use a histogram to display that data. A histogram for this data is shown below.

Each number on the $x-$axis has an associated column, the height of which determines how many students have that number of siblings. For example, the column at $x=2$ is 5 units high, indicating that there are 5 students with 2 siblings.

The categories on the $x-$axis are called bins. Histograms differ from bar charts in that they do not necessarily have fixed widths for the bins. They are also useful for displaying continuous data (data that varies continuously rather than in integer amounts). To illustrate this, look at the next examples.

Example 2

Rowena made a survey of the ages of passengers in a train carriage, and collected the results in a table. Display the results as a histogram.

Age range Frequency
0 - 9 6
10 - 19 2
20 - 29 9
30 - 39 8
40 - 49 11
50 - 59 4
60 - 69 2
70 - 79 0
80 - 89 1

Solution

Since the data is already collected into intervals, we will use these as our bins for the histogram. Even though the top end of the first interval is 9, the bin on our histogram will extend to 10. This is because, as we move to continuous data, we have a range of numbers that goes up to (but does not include) the lower end of the following bin. The range of values for the first bin would therefore be.

Age is greater than or equal to 0, but less than 10.

Algebraically, we would write:

$0 \le \ \text{Age}\ \le 10$

We will use this notation to label our bins in the next example.

Example 3

Monthly rainfall (in millimeters) for Beaver Creek Oregon was collected over a five year period, and the data is shown below. Display the data in a histogram.

$& 41.1, 254.7, 91.6, 60.9, 75.6, 36.0, 16.5, 10.6, 62.2, 89.4, 124.9, 176.7, 121.6, 135.6, 141.6,\\& 77.0, 82.8, 28.9, 6.7, 22.1, 29.9, 110.0, 179.3, 97.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8,\\& 60.0, 32.5, 61.7, 117.2, 194.5, 208.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8, 20.0, 32.5, 61.7,\\& 117.2, 194.5, 208.6, 133.1, 105.2, 92.0, 60.7, 52.8, 37.8, 14.8, 23.1, 41.3, 75.7, 134.6, 148.8$

Solution:

There are many ways we can organize this data. Notice the similarity between histograms and stem-and-leaf plots. A stem and leaf plot resembles a histogram on its side. We could start by making a stem-and-leaf plot of our data.

For our data above our stem would be the tens, and run from 1 to 25. We do not round the decimals in the data, we truncate them, meaning we simply remove the decimal. For example 165.7 would have a stem of 16 and a leaf of 5. We don't include the seven tenths.

By outlining the numbers on the stem and leaf plot, we can see what a histogram with a bin width of 10 would look like. We can even form a rudimentary histogram by outlining the data. You can see that with so many bins, the histogram looks random, and no clear pattern can be seen. In a situation like this, we need to reduce the number of bins. We will increase the bin width to 25 and collect the data in a table.

Rainfall (mm) Frequency
$0 \le x < 25$ 7
$25 \le x < 50$ 8
$50 \le x < 75$ 9
$75 \le x < 100$ 12
$100 \le x < 125$ 6
$125 \le x < 150$ 9
$150 \le x < 175$ 0
$175 \le x < 200$ 6
$200 \le x < 225$ 2
$225 \le x < 250$ 0
$250 \le x < 275$ 1

The histogram associated with this bin width is below.

The pattern in the distribution is far more apparent with fewer bins. So let's look at what the histogram would look like with even fewer bins. We will combine bins by pairs to give 6 bins with a bin-width of 50. Our table and histogram now looks like this.

Rainfall (mm) Frequency
$0 \le x < 50$ 15
$50 \le x < 100$ 21
$100 \le x < 150$ 15
$150 \le x < 200$ 6
$200 \le x < 250$ 2
$250 \le x < 300$ 1

Here is the histogram.

You can now clearly see the pattern. The normal monthly rainfall is around 75 mm, but sometimes it will be a very wet month and be higher (even much higher). It may be counter-intuitive, but sometimes by reducing the number of intervals (or bins) in a histogram you can see more information!

## Make Histograms Using a Graphing Calculator

Look again at the data from Example 1. We saw how the raw data we were given can be manipulated to give a stem-and-leaf plot and a histogram. We can take some of the tedious sorting work out of the process by using a graphing calculator to automatically sort our data into bins.

Example 4

The following unordered data represents the ages of passengers on a train carriage.

$& 35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2,\\& 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21.$

Use a graphing calculator to display the data as a histogram with bin widths of 10, 5 and 20:

Step 1 Input the data in your calculator.

Press [START] and choose the [EDIT] option.

Input the data into the table in column $L_1$.

Continue to enter all 43 data points,

Step 2 Select plot type.

Bring up the [STATPLOT] option by pressing [2nd], [Y=].

Highlight 1:Plot1 and press [ENTER]. This will bring up the plot options screen. Highlight the histogram and press [ENTER]. Make sure the Xlist is the list that contains your data.

Step 3 Select bin widths and plot.

Press [WINDOW] and ensure that Xmin and Xmax allow for all data points to be shown. The Xscl value determines the bin width.

Press [GRAPH] to display the histogram.

We can change bin widths and see how the histogram changes, by varying Xscl.

On the right are histograms with a bin width of 10, 5 and 20.

In this example, $\text{Xmin}=0$ and $\text{Xmax}=100$ will work whatever bin width we choose, but notice to display the histogram correctly the Ymax value is different for each.

## Review Questions

1. Complete the following stem and leaf plot. Use the first digit (hundreds) as the stem, and the second (tens) as the leaf. Truncate any units and decimals. Order the plot to find the median and the mode. $& \text{Data}: 607.4, 886.0, 822.2, 755.7, 900.6, 770.9, 780.8, 760.1, 936.9, 962.9, 859.9, 848.3, 898.7,\\& 670.9, 946.7, 817.8, 868.1, 887.1, 881.3, 744.6, 984.9, 941.5, 851.8, 905.4, 810.6, 765.3, 881.9,\\& 851.6, 815.7, 989.7, 723.4, 869.3, 951.0, 794.7, 807.6, 841.3, 741.5, 822.2, 966.2, 950.1$
2. Make a frequency table for the data in Question 1. Use a bin width of 50.
3. Plot the data from Question 1 as a histogram with a bin width of
1. 50
2. 100
4. The following stem-and-leaf plot shows data collected for the speed of 40 cars in a 35 mph limit zone in Culver City, California.
1. Find the mean, median and mode speed.
2. Complete the frequency table, starting at 25 mph with a bin width of 5 mph.
3. Use the table to construct a histogram with the intervals from your frequency table.

5. The histogram shown on the right displays the results of a larger scale survey of the number of siblings. Use it to find:
1. The median of the data
2. The mean of the data
3. The mode of the data
4. The number of people who have an odd number of siblings.
5. The percentage of the people surveyed who have 4 or more siblings.

1. The median from the plot would be 850; the mode is 880 (since all values are truncated at tens).
Bin interval Frequency
$600 \le x < 650$ 1
$650 \le x < 700$ 1
$700 \le x < 750$ 3
$750 \le x < 800$ 6
$800 \le x < 850$ 8
$850 \le x < 900$ 10
$900 \le x < 950$ 5
$950 \le x < 1000$ 6
1. The mean is 36.9 mph $(\approx 37 \ mph)$. The median is 35.5 mph. The mode is 32 mph.
Speed Frequency
$25 \le x < 30$ 5
$30 \le x < 35$ 12
$35 \le x < 40$ 10
$40 \le x < 45$ 7
$45 \le x < 50$ 4
$50 \le x < 55$ 1
$55 \le x < 60$ 1

1. In order to determine the answers, it is useful to construct the frequency table. Read directly from the histogram.
Number of Siblings Frequency
0 10
1 15
2 21
3 17
4 9
5 5
6 3
7 1
8 1
9 0
10 1
Total: 83

The sum tells us there were 83 people in the survey.

(a) The median is the 42nd value and the median $= 2$.

(b) $\text{Mean}=\frac{0(10) + 1(15) + 2(21) + 3(17) + 4(9) + 5(5) + 6(3) + 7 + 8 + 10}{83}=\frac{212}{83}=2.55$

(c) $\text{Mode}=2$

(d) $\text{n(odd)}=15 + 17 + 5 + 1=38$

(e) (%4 or more) $=\frac{9 + 5 + 3 + 1 + 1 + 1}{83}\times 100 \%=24.1 \%$

Feb 22, 2012

Aug 26, 2014