5.1: Categorical Data
Learning Objectives
- Organize categorical data in tables
- Construct bar graphs and pie charts by hand and with computer software programs
- Describe, summarize and compare categorical data
Each student in the class should complete the following survey. The data collected will be used in your homework problems. Notice that the variables in each question are categorical.
Frequency Tables and Bar Graphs
When analyzing categorical data (also called qualitative data), bar graphs are commonly used. A bar graph is a graph in which each bar shows how frequently a given category occurs. It is usually helpful to organize the data in a frequency table, a table that shows the number of occurrences for each category, before constructing the bar graph. The bars can go either horizontally or vertically, they should be of consistent width, and need to be equally spaced apart. The categories are separate and can be put in any order along the axis. It is common to put them in alphabetical order, but not needed. And, as with all of the graphs you will construct, be sure to use a consistent scale, include a title, labels for axes, numbers to mark axes as necessary, and a key whenever needed.
Example 1
A bar graph could show the types of pets of a group of students for example. Here are the types of pets owned by a class of 33 geometry students.
a) Why do the numbers add up to more than 33?
b) Construct a bar graph to show this class' data.
c) Describe what the graph shows.
Solution
a) They add up to more than 33 because some students own more than one type of pet and are being counted in more than one category.
b) Here is a bar graph that was created using Excel:
License: CC BY-NC 3.0c) For this class, the most common pet is a dog. Fourteen students, or 42% of the class, own a dog. Having a cat, or no pet at all are the next most common events. Five students own some type of rodent, two have reptiles for pets, and three have fish. There are also two students who own some other type of pet. [Figure5]
Example 2
A great deal of electronic equipment ends up in landfills as people update their computers, TVs, cell phones, etc. This is a concern because the chemicals from batteries and other electronics add toxins to the environment. This Electronic Waste has been studied in an effort to decrease the amount of pollution and hazardous waste. The following frequency table shows the amount of tonnage of the most common types of electronic equipment discarded in the United States in 2005. Construct a bar graph and comment on what it shows.
Electronic Equipment | Thousands of Tons Discarded |
Cathode Ray Tube (CRT) TV's | 7591.1 |
CRT Monitors | 389.8 |
Printers, Keyboards, Mice | 324.9 |
Desktop Computers | 259.5 |
Laptop Computers | 30.8 |
Projection TV's | 132.8 |
Cell Phones | 11.7 |
LCD Monitors | 4.9 |
Electronics Discarded in the US (2005). Source: National Geographic, January 2008. Volume 213 No.1, pg 73.
Solution
The type of electronic equipment is a categorical variable, and therefore, this data can easily be represented using the bar graph below:
[Figure6]License: CC BY-NC 3.0
According to this 2005 data, the most commonly disposed of electronic equipment was CRT TV's, by more than 19 times that of the next type of electronic equipment.
Pie Charts
Pie charts (or circle graphs) are used extensively in statistics. These graphs are used to display categorical data and appear often in newspapers and magazines. A pie chart shows each category (sectors) as a part of the whole (circle). The relationships between the parts, and to the whole, are visible in a pie chart, by comparing the sizes of the sectors (slices). Constructing a pie chart uses the fact that the whole of anything is equal to 100%-all of the sectors equal the whole circle. Remember from geometry that the central angles of a circle total 360^{0}. So, in regard to pie charts, 360^{0} = 100% of the circle. The sections should have different colors or patterns to enable an observer to clearly see the difference in size of each section.
Pie charts are the appropriate choice when you are working with categorical data that covers 100%. It is not an appropriate choice when you aren't working with 100% or when choices may include overlaps. For example, when we asked every student in this class to list the pets they currently have, we found some students who have more than one pet. So a pie chart would not be an appropriate way to display that data. The sectors in a circle graph do not allow for overlaps such as this. Another time when pie charts are not appropriate is when the choices do not cover all possibilities. For example, the electronic waste example above does not include every possibility, so the categories would not add to 100%. In such cases a bar graph would be a more appropriate choice, because it allows for overlaps and does not need to cover exactly 100% of the choices.
Example 3: How to Construct a Pie Chart
The Red Cross Blood Donor Clinic had a very successful morning collecting blood donations. Within three hours twenty-five people had made donations. The types of blood dontated are:
Blood Type | A | B | O | AB |
Number of donors |
7 | 5 | 9 | 4 |
Construct a pie chart to represent the data.
Solution
Step 1: Determine the total number of donors.
Step 2: Express each donor number as a percent of the whole by using the formula where is the frequency and is the total number.
Step 3: Express each donor number as the number of degrees of a circle that it represents by using the formula where is the frequency and is the total number.
Step 4: Using a protractor or technology to make the central angles, graph each section of the circle.
Step 5: Write the label and correct percentage inside the section. Color each section a different color. Be sure to include a title, and a key if needed.
License: CC BY-NC 3.0
From the graph, you can see that more donations were of Type O than any other type. The least amount of blood collected was of Type AB. In order to create a pie graph by using the circle, it is necessary to use the percent of a section to compute the correct degree measure for the central angle. The blood type graph labels each section with context and percent, and not the degrees. This is because degrees would not be meaningful to an observer trying to interpret the graph. If the sections are not labeled directly as they are in that example, it is necessary to include a key so that the observers will know what each section represents.
Graphs on Computer Software
The above pie chart could be created by using a protractor and graphing each section of the circle according to the number of degrees needed for each section. However, bar graphs and pie charts are most frequently made with computer software programs such as Excel or Google Docs, if you would like to learn how to do this on Excel, click here. You will be asked to create bar graphs and pie charts using computer software. When you do this, be sure to include titles, labels, and keys when needed. Be sure to 'fix' the graph generated by the software program so that it looks the way you want it to look and shows clearly what ever it is you are trying to convey.
Example 4
Comment on what the graph shows:
Solution
Several people were asked to choose their favorite fruits from a list of six options. Apples were the favorite choice with 35% of the participants choosing them. The second favorite fruit was cherries at 25%, followed by grapes with 20%. Ten percent of the people said that dates were their favorite fruit. However, only 7% chose bananas from the choices provided and the remaining 3% liked some fruit other than those listed.
Pictographs
Another type of graph that is sometimes used to display categorical data is a pictograph. A pictograph is basically a bar graph with pictures instead of bars. A problem with pictures in graphs is that the area that they take up can mislead the observer. The width and height both increase as the picture gets larger. Pictographs are often used in adds and magazines. They can be a fun way to make the graphs more interesting in appearance. However, pictographs can be misleading and can be distracting, so they are generally avoided in serious statistical representations.
Example 5
The following graph compares the number of wins for high school football teams during the 2010 seasons. Explain why the pictograph is misleading.
Solution
The pictures increased in both height and width. So when something should be doubled, it actually looks four times as big. For example, when comparing the number of wins between Eisenhower and Adams the graph should show 4 times as many wins. However, in this pictograph it looks as though Adams had 16 times as many wins (4 times as wide X 4 times as tall).
Problem Set 5.1
Section 5.1 Exercises
1) Many students at SRHS were given a questionnaire regarding their interests outside of school. The results of one of the questions, "Favorite After-School Activity?", are shown in the table below.
License: CC BY-NC 3.0
Source: http://www.mathgoodies.com
a) Create a bar graph for this data.
b) Why is a pie chart also appropriate for this example?
c) Calculate the percent of total for each category and the central angle for each category.
d) Create a pie chart for this data.
2) Based on what you can see in the graph, write a brief description of what it is showing. This should be at least three sentences and in context.
License: CC BY-NC 3.0
Source.http://www.mathworksheetscenter.com Aug. 5, 2011.
3) Type of Pet?
a) Construct a frequency table to show the Type of Pet data from our class.
b) Use Excel or Google Docs to create a bar graph that shows the types of pets the students in our class have.
c) Write a brief description of what your graph shows.
4) Favorite Season?
a) Construct a frequency table to show the Favorite Season data from our class.
b) Use Excel or Google Docs to create a pie chart that shows the favorite season of the year for the students in our class.
c) Write a brief description of what your graph shows.
5) Look at the school lunch graph that was created by some students:
License: CC BY-NC 3.0
a) In what way is this graphical representation misleading? Explain.
b) Create a better graphical representation for this same data.
6) Favorite Foods?
a) Construct a frequency table to show the Favorite Food data separately for males and females for our class.
b) Use Excel or Google Docs to create two pie charts that compare the favorite food types for the boys and girls in our class. The charts should 'match' as much as possible-- they should be the same size and use the same colors, fonts, etc.
c) Write a brief description comparing the boys and girls choices for favorite food. Look for similarities and differences.
7) The following table has Minnesota Wild statistics for 2010-2011, for some of the Wild players. Thirteen variables are listed across the top and have been highlighted.
a) Identify the individuals.
b) Identify what each variable is (Example GP = games played). You may need to do some research.
c) Classify each variable as numerical or categorical?
License: CC BY-NC 3.0
Review Exercises
8) John forgot to study for his history quiz, so he will guess on each question. The quiz has 5 true-false questions and 5 multiple-choice questions (with 4 choices each). He will guess an answer for each question. In how many possible ways might John answer all of the questions?
9) What is the probability that John will get all of the questions correct?
Image Attributions
- [1]^ License: CC BY-NC 3.0
- [2]^ License: CC BY-NC 3.0
- [3]^ License: CC BY-NC 3.0
- [4]^ License: CC BY-NC 3.0
- [5]^ License: CC BY-NC 3.0
- [6]^ License: CC BY-NC 3.0
- [7]^ License: CC BY-NC 3.0
- [8]^ License: CC BY-NC 3.0
- [9]^ License: CC BY-NC 3.0
- [10]^ License: CC BY-NC 3.0
- [11]^ License: CC BY-NC 3.0
- [12]^ License: CC BY-NC 3.0
- [13]^ License: CC BY-NC 3.0