# 5.4: Numerical Data: Histograms

**At Grade**Created by: Bruce DeWItt

### Learning Objectives

- Construct histograms
- Describe distributions including shape, outliers, center, context, and spread.

### Histograms

When it is not necessary to show every value the way a stem plot would do, a histogram is a useful graph. Histograms organize numerical data into ranges, but do not show the actual values. The **histogram** is a summary graph showing how many of the data points falling within various ranges. Even though a histogram looks similar to a bar graph, it is not the same. Histograms are for numerical data and each 'bar' covers a range of values. Each of these 'bars' is called a **class** or **bin.** Histograms are a great way to see the shape of a distribution and can be used even when working with a large set of data.

The width of the bins is the most important decision when constructing a histogram. The bins need to be of consistent width (i.e. all cover a range of 10, or 25, etc.). It is generally a good idea to try to have 7 to 15 bins. Start with the range and divide by 10. This will give you a rough idea of how wide to make your bins. From there it becomes a judgment call as to what is a reasonable bin width. For example, it really does not make any sense to count by 11.24 just because that is what the range divided by 10 is equal to. In such a case, it might make more sense to count by 10's or 12's depending on the specific data.

#### Example 1

Suppose that the test scores of 27 students were recorded. The scores were: 8, 12,17, 22, 24, 28, 31, 37, 37, 39, 40, 42, 43, 47, 48, 51, 57, 58, 59, 60, 65, 65, 74, 75, 84, 88, 91. The lowest score was an 8 and the highest was a 91. Construct a histogram.

#### Solution

Plan bin width: The first step is to look at the range (91 - 8 = 83). Then divide the range by 10 (83/10 = 8.3). It doesn't make any sense to count by bins of 8.3 points, so we may use 8, or 10, or 12. Next we look at where to start. The first number is 8. It doesn't make any sense to start counting at 8 either, or to end at 91. We will probably want to start from 0 and end at 100, counting by 10's should work nicely.

*Where to begin, and what to count by are not obvious to a calculator or many computer software programs. The graphing calculator would probably start at 8, and count by 8.3. Leaving you with bins of [8 -16.3); [16.3-24.6); [24.6 -32.9); etc. So, if you are using technology to create a histogram, you will generally need to fix the window so that the bins make sense.

Mark horizontal axis: Mark your scale along the horizontal axis to cover your entire range and to count by your decided upon bin width. Include numbers.

Count number of values within each bin: How many values falls between 0 and <10? One, so we make the bin one unit tall. Between 10 and <20? Two, so we make the bin two units tall, etc. A frequency table may be helpful here. You need to know how tall to make each bin. You especially need to know how tall to make the tallest of the bins.

Mark vertical axis: Your vertical axis needs to reach the height of the tallest bin. Mark your vertical axis by consistent steps so that it will reach the number needed. Include numbers.

*

For instance, if you need to get to 2,460; then you should probably count by steps of 250's or even a larger number.

Make your histogram: Make the bins the correct heights, shade or color them in, add labels including any units, a title, and a key if needed.

TEST SCORES

Test score histogram. http://www.netmba.com

The bins in this example are [0 to 10); [10 to 20); etc. This means that zero up to, but not including, 10 are in the first bin (9.999 would be in bin #1, but 10 would be in bin #2).

You may be creating your histograms with paper and pencil. However, the graphing calculators are a great way to create histograms as well. It takes a little practice to learn how to adjust the windows, but you have the opportunity to try out different bin widths without needing to erase or start all over. Also, you may want to see how to create histograms in excel. When you use a graphing calculator to create your graphs, you should sketch what the calculator shows you. Your sketch should look similar to the graphing window shown, and will still need labels and titles.

#### Example 2

a) Construct a histogram to look at the distribution of acceptance rates for these U.S. Universities.

b) Describe your findings.

#### Solution

a) Try this on your calculator:Enter the data in a list and set up a histogram.

Plan bin width: Determine the range (72 -11= 61).Divide by 10 (61/10 = 6.1) to get a rough idea of a good bin width. We can use a variety of bin width of 5, 7.5, 8, or 10, etc. We must start before the minimum of 11 (start at 0 or 10), and pass the maximum of 72 (80).After trying a few of these bins, we decide to use bins of 10, starting at 10 and ending at 80. Here is the window that was used: {x-min =10, x-max=80, x-scl=10, y-min=-2, y-max=5, y-scl=1}

Mark horizontal axis: Mark your scale along the horizontal axis to cover your entire range and to count by your decided upon bin width. Include numbers.

Count number of values within each bin: A frequency table may be helpful here. You need to know how tall to make each bin. You especially need to know how tall to make the tallest of the bins.

Mark vertical axis: Your vertical axis needs to reach the height of the tallest bin. Mark your vertical axis by consistent steps so that it will reach the number needed. Include numbers.

Make your histogram: Make the bins the correct heights, shade or color them in, add labels including and units, a title, and a key if needed.

b) Describe:The median and mean are difficult to identify from just a histogram. You will often only be able to estimate them. In this case, we were given all of the original data so we can find the exact values. When possible, identify outliers specifically.

The median acceptance rate for these Universities is 30%. The percent of students applying, who are accepted to these universities ranged from 11% to 72%. However, the 72% was an extremely high outlier because the next highest rate was 49%. The majority of these schools accepted 36% or fewer of those who applied. The distribution is heavily skewed to the right because of the high outlier of American University.

### Problem Set 5.4

#### Section 5.4 Exercises

1) This graph shows the distribution of salaries (in thousands of dollars) for the employees of a large school district. Answer the questions that follow.

Source: http://4.bp.blogspot.com

a) Approximately how many employees make $77,000 or more per year?

b) What is the bin width here? Be careful.

c) Without calculating anything, how would you describe the typical salary of an employee of this school district?

2) Jessica is a freshman at the University of Minnesota, Duluth. She has been watching her weight because she is afraid of gaining that 'freshman fifteen' she keeps hearing about. She has weighed herself every Monday morning since school started. Here is a histogram showing the results in pounds of all of these *Monday-Morning-Weigh-In's*.

a) Describe the distribution. Remember your S.O.C.C.S!

b) What is the range for the bin that has 6 observations?

c) For her height, Jessica feels that 140 lbs. is her ideal weight. What percent of the time has she been within 5 lbs. of her ideal weight?

3) Pretend you are a journalist.

a) What do you notice that is wrong with this graph?

b) Based on only what you can see in the graph and labels, write several sentences that could go with this graph. (Think S.O.C.C.S!) Ignore the mistake from part (a).

Men and exercise graph: http://www2.le.ac.uk

4) Here are the statistics from several of the Minnesota Wild players. We are going to analyze the Penalties in Minutes (PIM) data.

a) Construct a histogram for the penalties in minutes for the Wild players included on that list.

b) Describe the distribution. Remember your S.O.C.C.S!

5) The following table lists the average life expectancy for people in several countries, as of 2010. Source: http://dataworldbank.org.

a) Construct a histogram for the distribution of life expectancies for these countries (start at Xmin = 45 and use a bin width of 5).

b) Based on the shape of your graph, do you expect the mean or median to be higher?

c) Calculate the range and the three measures of central tendency (mean, median & mode).

d) Which of these three measures of central tendency is most appropriate in this context? Explain.

6) Sketch a histogram that fits the following scenarios:

a) Symmetrical with a few high outliers and a few low outliers.

b) Strongly skewed right with no outliers.

c) Bimodal and symmetrical.

d) Skewed left with a few outliers.

e) Doesn't fit any of the descriptions we have learned.

#### Review Exercises

7) The local booster club is holding a raffle. There will be one prize of $1000, two prizes of $250, five prizes of $50, and 10 prizes of $25. They are selling 500 tickets at $10 each.

a) Construct a probability distribution table that shows the prizes and the probabilities of winning them.

b) What is the expected value of a single raffle ticket?

c) Is this raffle considered a "fair game"? Explain why or why not.

8) There is a fish bowl with 4 gold fish, 7 turquoise fish, and 5 pink fish, on the counter. Simon the cat is playing a game where he closes his eyes, reaches in to the bowl, grabs a fish and sees what color the fish is. He then puts the fish back and repeats the process. Find the following probabilities.

a) P(2 turquoise fish)

b) P(exactly one of the fish is gold)

c) P(a pink fish, then a gold fish)

9) If Simon changes the game so that he eats the fish after he takes them out of the bowl, find the following probabilities.

a) P(2 pink fish)

b) P(exactly one of the fish is turquoise)

c) P(no gold fish)