### Stem-and-Leaf Plots and Histograms

Imagine asking a class of 20 algebra students how many brothers and sisters they had. You would probably get a range of answers from zero on up. Some students would have no siblings, but most would have at least one. The results might look like this:

1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6

We could organize this information in many ways. The first way might just be to create an ordered list, relisting all the numbers in order, starting with the smallest:

0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 4, 6

Another way to list the results is in a table:

Number of siblings |
Number of matching students |
---|---|

0 | 4 |

1 | 7 |

2 | 5 |

3 | 2 |

4 | 1 |

5 | 0 |

6 | 1 |

We could also make a visual representation of the data by making categories for the number of siblings on the \begin{align*}x-\end{align*}axis, and stacking representations of each student above the category marker. We could use crosses, stick-men or even photographs of the students to show how many students are in each category.

**Make and Interpret Stem-and-Leaf Plots**

Another useful way to display data is with a **stem-and-leaf plot**. Stem-and-leaf plots are especially useful because they give a visual representation of how the data is clustered, but preserve all of the numerical information. A stem-and-leaf plot consists of a vertical “stem” containing the first digit of each number, with the rest of each number written to the right of the stem like a “leaf.” In the stem and leaf plot below, the first number represented is 21. It is the only number with a stem of 2, so that makes it the only number in the 20’s. The next two numbers have a common stem of 3. They are 33 and 36. The next numbers are 40, 46 and 47.

Stem-and-leaf plots have a number of advantages over simply listing the data in a single line.

- They show how data is distributed, and whether it is symmetric around the center.
- They can be used as the data is being collected.
- They make it easy to determine the median and mode.

Stem-and-leaf plots are not ideal for all situations; in particular they are not practical when the data is too tightly clustered. For example, with the data above about students’ siblings, all the data points would occupy the same stem (zero). In that case, no additional information could be gained from a stem-and-leaf plot.

#### Creating a Stem-and-Leaf Plot

While traveling on a long train journey, Rowena collected the ages of all the passengers traveling in her carriage. The ages for the passengers are shown below. Arrange the data into a stem-and-leaf plot, and use the plot to find the median and mode ages.

*35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2, 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21*

The first step is to determine a sensible ** stem**. Since all the values fall between 1 and 84, the stem should represent the tens column, and run from 0 to 8 so that the numbers represented can range from 00 (which we would represent by placing a leaf of 0 next to the 0 on the stem) to 89 (a leaf of 9 next to the 8 on the stem). We then go through the data and fill out our plot:

You can see immediately that the interval with the most number of passengers is the 40-49 group. In order to correctly determine the median and the mode, it is helpful to construct a second, **ordered stem and leaf plot**, placing the leaves on each branch in ascending order

The mode is now apparent—there are 4 zeros in a row on the 4-branch, so the mode is 40. The median is the middle value; since there are 43 data points, the median is the \begin{align*}22^{nd}\end{align*} value. (Using our formula from earlier, \begin{align*}\frac{43+1}{2} = 22\end{align*}.) So the median is 37.

**Make and Interpret Histograms**

Look again at the example of the algebra students and their siblings. The data was collected in the following list.

1, 4, 2, 1, 0, 2, 1, 0, 1, 2, 1, 0, 0, 2, 2, 3, 1, 1, 3, 6

We were able to organize the data into a table. Here is the table again, but this time we will use the word *frequency* as a header to indicate the number of times each value occurs in the list.

Number of siblings |
Frequency |
---|---|

0 | 4 |

1 | 7 |

2 | 5 |

3 | 2 |

4 | 1 |

5 | 0 |

6 | 1 |

Now we could use this table as an \begin{align*}(x, y)\end{align*} coordinate list to plot a line diagram like this one:

While this diagram does indeed show the data, it is somewhat misleading. For example, the continuous line joining the number of students with one and two siblings makes it look like we know something about how many students have 1.5 siblings (which of course, is impossible). In this case, where the data points are all integers, it’s wrong to suggest that the function is continuous between the points!

When the data we are representing falls into well defined categories (such as the integers 1, 2, 3, 4, 5 & 6) it is more appropriate to use a ** histogram** to display that data. A histogram for this data is shown below.

Each number on the \begin{align*}x-\end{align*}axis has an associated column, whose height shows how many students have that number of siblings. For example, the column at \begin{align*}x = 2\end{align*} is 5 units high, indicating that there are 5 students with 2 siblings.

The categories on the \begin{align*}x-\end{align*}axis are called **bins**. Histograms differ from bar charts in that they don’t necessarily have fixed widths for the bins. They are also useful for displaying **continuous data** (data that varies continuously rather than in integer amounts). To illustrate this, here are some examples.

#### Displaying Data in a Histogram

Monthly rainfall (in millimeters) for Beaver Creek Oregon was collected over a five year period, and the data is shown below. Display the data in a histogram.

*41.1, 254.7, 91.6, 60.9, 75.6, 36.0, 16.5, 10.6, 62.2, 89.4, 124.9, 176.7, 121.6, 135.6, 141.6, 77.0, 82.8, 28.9, 6.7, 22.1, 29.9, 110.0, 179.3, 97.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8, 60.0, 32.5, 61.7, 117.2, 194.5, 208.6, 176.8, 143.5, 129.8, 94.9, 77.0, 60.8, 20.0, 32.5, 61.7, 117.2, 194.5, 208.6, 133.1, 105.2, 92.0, 60.7, 52.8, 37.8, 14.8, 23.1, 41.3, 75.7, 134.6, 148.8*

Notice the similarity between histograms and stem-and-leaf plots. A stem-and-leaf plot resembles a histogram on its side. We could start by making a stem-and-leaf plot of our data.

For our data above our stem would be the tens, and run from 1 to 25. Instead of rounding the decimals in the data, we **truncate** them, meaning we simply remove the decimal. For example, 165.7 would have a stem of 16 and a leaf of 5, and we would just leave out the seven tenths.

By outlining the numbers on the stem and leaf plot, we can see what a histogram with a bin-width of 10 would look like. You can see that with so many bins, the histogram looks random, with no clear pattern visible. In a situation like this we need to reduce the number of bins. We will increase the bin width to 25 and collect the data in a table:

Rainfall (mm) |
Frequency |
---|---|

\begin{align*}0 \le x < 25\end{align*} | 7 |

\begin{align*}25 \le x < 50\end{align*} | 8 |

\begin{align*}50 \le x < 75\end{align*} | 9 |

\begin{align*}75 \le x < 100\end{align*} | 12 |

\begin{align*}100 \le x < 125\end{align*} | 6 |

\begin{align*}125 \le x < 150\end{align*} | 9 |

\begin{align*}150 \le x < 175\end{align*} | 0 |

\begin{align*}175 \le x < 200\end{align*} | 6 |

\begin{align*}200 \le x < 225\end{align*} | 2 |

\begin{align*}225 \le x < 250\end{align*} | 0 |

\begin{align*}250 \le x < 275\end{align*} | 1 |

The histogram associated with this bin width is below.

The pattern in the distribution is far more apparent with fewer bins. So let's look at what the histogram would look like with even fewer bins. We will combine bins by pairs to give 6 bins with a bin-width of 50. Our table and histogram now looks like this.

Rainfall (mm) |
Frequency |
---|---|

\begin{align*}0 \le x < 50\end{align*} | 15 |

\begin{align*}50 \le x < 100\end{align*} | 21 |

\begin{align*}100 \le x < 150\end{align*} | 15 |

\begin{align*}150 \le x < 200\end{align*} | 6 |

\begin{align*}200 \le x < 250\end{align*} | 2 |

\begin{align*}250 \le x < 300\end{align*} | 1 |

The pattern is much clearer now. The normal monthly rainfall is around 75 mm, but sometimes it will be a very wet month and be higher (even much higher).

You can see that although it may be counter-intuitive, sometimes you can see more information by reducing the number of intervals (or bins) in a histogram. It’s a bit like zooming out on a picture; you can’t see as many of the details, but the overall shape of what you are looking at may become clearer.

**Make Histograms Using a Graphing Calculator**

Look again at the data from the first example. We’ve seen how to manipulate raw data to give a stem-and-leaf plot and a histogram. Now let’s take some of the tedious sorting work out of the process by using a graphing calculator to automatically sort our data into bins.

The following unordered data represents the ages of passengers on a train carriage.

*35, 42, 38, 57, 2, 24, 27, 36, 45, 60, 38, 40, 40, 44, 1, 44, 48, 84, 38, 20, 4, 2, 48, 58, 3, 20, 6, 40, 22, 26, 17, 18, 40, 51, 62, 31, 27, 48, 35, 27, 37, 58, 21.*

Use a graphing calculator to display the data as a histogram with bin-widths of 10, 5 and 20.

**Input the data in your calculator:**

Press **[START]** and choose the **[EDIT]** option.

Input all 43 data points into the table in column \begin{align*}L_1\end{align*}.

**Select plot type:**

Bring up the **[STATPLOT]** option by pressing **[2nd]**, **[Y=]**.

Highlight **1:Plot1** and press **[ENTER]**. This will bring up the plot options screen. Highlight the histogram and press **[ENTER]**. Make sure the **Xlist** is the list that contains your data.

**Select bin widths and plot:**

Press **[WINDOW]** and ensure that **Xmin** and **Xmax** allow for all data points to be shown. The **Xscl** value determines the bin width.

Press **[GRAPH]** to display the histogram.

You can change bin widths and see how the histogram changes, by varying **Xscl**. Below are histograms with bin widths of 10, 5 and 20. (In this example \begin{align*}Xmin = 0\end{align*} and \begin{align*}Xmax = 100\end{align*} will work whatever bin width we choose, but notice that to display the histogram correctly we need to use a different **Ymax** value for each.)

### Example

#### Example 1

Rowena made a survey of the ages of passengers in a train carriage, and collected the results in a frequency table. Display the results as a histogram.

Age range |
Frequency |
---|---|

0 – 9 | 6 |

10 – 19 | 2 |

20 – 29 | 9 |

30 – 39 | 8 |

40 – 49 | 11 |

50 – 59 | 4 |

60 – 69 | 2 |

70 – 79 | 0 |

80 – 89 | 1 |

**Solution**

Since the data is already collected into intervals we will use these as our bins for the histogram. Even though the top end of the first interval is 9, the bin on our histogram will extend to 10. This is because, as we move to continuous data, we have a range of numbers that goes right up to the lower end of the following bin, even if it doesn’t include that number. The range of values for the first bin would therefore be \begin{align*}0 \le x < 10\end{align*}, and all the other bins would have similarly described ranges.

### Review

- Create a stem-and-leaf plot for the following data. Use the first digit (
**hundreds**) as the stem, and the second (**tens**) as the leaf. Truncate any**units**and**decimals**. Order the plot to find the median and the mode.**data:**607.4, 886.0, 822.2, 755.7, 900.6, 770.9, 780.8, 760.1, 936.9, 962.9, 859.9, 848.3, 898.7, 670.9, 946.7, 817.8, 868.1, 887.1, 881.3, 744.6, 984.9, 941.5, 851.8, 905.4, 810.6, 765.3, 881.9, 851.6, 815.7, 989.7, 723.4, 869.3, 951.0, 794.7, 807.6, 841.3, 741.5, 822.2, 966.2, 950.1. - Make a frequency table for the data in Question 1. Use a bin width of 50.
- Plot the data from Question 1 as a histogram with a bin width of
- 50
- 100

For 4-6, use the following stem-and-leaf plot which shows data collected for the speed of 40 cars in a 35 mph limit zone in Culver City, California.

- Find the mean, median and mode speed.
- Create a frequency table, starting at 25 mph with a bin width of 5 mph.
- Use the table to construct a histogram with the intervals from your frequency table.

For 7-11 use the histogram shown below. The data is the result of a survey of each subject's number of siblings.

- The median of the data.
- The mean of the data.
- The mode of the data.
- The number of people who have an odd number of siblings.
- The percentage of the people surveyed who have 4 or more siblings.

### Review (Answers)

To view the Review answers, open this PDF file and look for section 13.11.