Understand the importance of a stem-and-leaf plot in statistics.
Construct and interpret a pie chart.
Construct and interpret a bar graph.
Create a frequency distribution chart.
Construct and interpret a histogram.
Use technology to create graphical representations of data.
What is the puppet doing? She can’t be cutting a pizza, because the pieces are all different colors and sizes. It seems like she is drawing some type of a display to show different amounts of a whole circle. The colors must represent different parts of the whole. As you proceed through this lesson, refer back to this picture so that you will be able to create a meaningful and detailed answer to the question, “What is the puppet doing?”
Pie charts, or circle graphs, are used extensively in statistics. These graphs appear often in newspapers and magazines. A pie chart shows the relationship of the parts to the whole by visually comparing the sizes of the sections (slices). Pie charts can be constructed by using a hundreds disk or by using a circle. The hundreds disk is built on the concept that the whole of anything is 100%, while the circle is built on the concept that 360∘ is the whole of anything. Both methods of creating a pie chart are acceptable, and both will produce the same result. The sections have different colors to enable an observer to clearly see the differences in the sizes of the sections. The following example will first be done by using a hundreds disk and then by using a circle.
The Red Cross Blood Donor Clinic had a very successful morning collecting blood donations. Within 3 hours, people had made donations, and the following is a table showing the blood types of the donations:
Number of donors
Construct a pie chart to represent the data.
Step 1: Determine the total number of donors: 7+5+9+4=25.
Step 2: Express each donor number as a percent of the whole by using the formula Percent=fn⋅100%, where f is the frequency and n is the total number.
Step 3: Use a hundreds disk and simply count the correct number for each blood type (1 line = 1 percent).
Step 4: Graph each section. Write the name and correct percentage inside the section. Color each section a different color.
The above pie chart was created by using a hundreds disk, which is a circle with 100 divisions in groups of 5. Each division (line) represents 1 percent. From the graph, you can see that more donations were of Type O than any other type. The fewest number of donations of blood collected was of Type AB. If the percentages had not been entered in each section, these same conclusions could have been made based simply on the size of each section.
Step 1: Determine the total number of donors: 7+5+9+4=25.
Step 2: Express each donor number as the number of degrees of a circle that it represents by using the formula Degrees=fn⋅360∘, where f is the frequency and n is the total number.
Step 3: Using a protractor, graph each section of the circle.
Step 4: Write the name and correct percentage inside each section. Color each section a different color.
The above pie chart was created by using a protractor and graphing each section of the circle according to the number of degrees needed. From the graph, you can see that more donations were of Type O than any other type. The fewest number of donations of blood collected was of Type AB. Notice that the percentages have been entered in each section of the graph and not the numbers of degrees. This is because degrees would not be meaningful to an observer trying to interpret the graph. In order to create a pie chart by using a circle, it is necessary to use the formula to calculate the number of degrees for each section, and in order to create a pie chart by using a hundreds disk, it is necessary to use the formula to determine the percentage for each section. In the end, however, both methods result in identical graphs.
A new restaurant is opening in town, and the owner is trying very hard to complete the menu. He wants to include a choice of 5 salads and has presented his partner with the following pie chart to represent the results of a recent survey that he conducted of the town’s people. The survey asked the question, "What is your favorite kind of salad?"
Use the pie chart to answer the following questions:
Which salad was the most popular choice?
Which salad was the least popular choice?
If 300 people were surveyed, how many people chose each type of salad?
What is the difference between the number of people who chose the spinach salad and the number of people who chose the garden salad?
1. The most popular salad was the caesar salad.
2. The least popular salad was the taco salad.
3. Caesar salad: 35%=35100=0.35
Taco salad: 10%=10100=0.10
Spinach salad: 17%=17100=0.17
Garden salad: 13%=13100=0.13
Chef salad: 25%=25100=0.25
4. The difference between the number of people who chose the spinach salad and the number of people who chose the garden salad is 51−39=12people.
If we revisit the puppet who was introduced at the beginning of the lesson, you should now be able to create a story that details what she is doing. An example would be that she is in charge of the student body and is presenting to the students the results of a questionnaire regarding student activities for the first semester. Of the 5 activities, the one that is orange in color is the most popular. The students have decided that they want to have a winter carnival week more than any other activity.
In statistics, data is represented in tables, charts, and graphs. One disadvantage of representing data in these ways is that the actual data values are often not retained. One way to ensure that the data values are kept intact is to graph the values in a stem-and-leaf plot. A stem-and-leaf plot is a method of organizing the data that includes sorting the data and graphing it at the same time. This type of graph uses a stem as the leading part of a data value and a leaf as the remaining part of the value. The result is a graph that displays the sorted data in groups, or classes. A stem-and-leaf plot is used most when the number of data values is large.
At a local veterinarian school, the number of animals treated each day over a period of 20 days was recorded. Construct a stem-and-leaf plot for the data set, which is as follows:
Step 1: Create the stem-and-leaf plot.
Some people prefer to arrange the data in order before the stems and leaves are created. This will ensure that the values of the leaves are in order. However, this is not necessary and can take a great deal of time if the data set is large. We will first create the stem-and-leaf plot, and then we will organize the values of the leaves.
The leading digit of a data value is used as the stem, and the trailing digit is used as the leaf. The numbers in the stem column should be consecutive numbers that begin with the smallest class and continue to the largest class. If there are no values in a class, do not enter a value in the leaf−just leave it blank.
Step 2: Organize the values in each leaf row.
Now that the graph has been constructed, there is a great deal of information that can be learned from it.
The number of values in the leaf column should equal the number of data values that were given in the table. The value that appears the most often in the same leaf row is the trailing digit of the mode of the data set. The mode of this data set is 35. For 7 of the 20 days, the number of animals receiving treatment was between 34 and 39. The veterinarian school treated a minimum of 5 animals and a maximum of 60 animals on any one day. The median of the data can be quickly calculated by using the values in the leaf column to locate the value in the middle position. In this stem and leaf plot, the median is the mean of the sum of the numbers represented by the 10th and the 11th leaves: 35+352=702=35.
The following numbers represent the growth (in centimeters) of some plants after 25 days.
Construct a stem-and-leaf plot to represent the data, and list 3 facts that you know about the growth of the plants.
Answers will vary, but the following are some possible responses:
From the stem-and-leaf plot, the growth of the plants ranged from a minimum of 10 cm to a maximum of 61 cm.
The median of the data set is the value in the 13th position, which is 41 cm.
There was no growth recorded in the class of 20 cm, so there is no number in the leaf row.
The data set is multimodal.
The different types of graphs that you have seen so far are plots to use with quantitative variables. A qualitative variable can be plotted using a bar graph. A bar graph is a plot made of bars whose heights (vertical bars) or lengths (horizontal bars) represent the frequencies of each category. There is 1 bar for each category, with space between each bar, and the data that is plotted is discrete data. Each category is represented by intervals of the same width. When constructing a bar graph, the category is usually placed on the horizontal axis, and the frequency is usually placed on the vertical axis. These values can be reversed if the bar graph has horizontal bars.
Construct a bar graph to represent the depth of the Great Lakes:
Lake Superior – 1,333 ft.
Lake Michigan – 923 ft.
Lake Huron – 750 ft.
Lake Ontario – 802 ft.
Lake Erie – 210 ft.
The following bar graph represents the results of a survey to determine the type of TV shows watched by high school students:
Use the bar graph to answer the following questions:
What type of show is watched the most?
What type of show is watched the least?
Approximately how many students participated in the survey?
Does the graph show the differences between the preferences of males and females?
Sit-coms are watched the most.
Quiz shows are watched the least.
Approximately 45+20+18+6+35+16=140 students participated in the survey.
No, the graph does not show the differences between the preferences of males and females.
If bar graphs are constructed on grid paper, it is very easy to keep the intervals the same size and to keep the bars evenly spaced. In addition to helping in the appearance of the graph, grid paper also enables you to more accurately determine the frequency of each class.
The following bar graph represents the part-time jobs held by a group of grade 10 students:
Using the above bar graph, answer the following questions:
What was the most popular part-time job?
What was the part-time job held by the least number of students?
Which part-time jobs employed 10 or more of the students?
Is it possible to create a table of values for the bar graph? If so, construct the table of values.
What percentage of the students worked as a delivery person?
1. The most popular part-time job was in the fast food industry.
2. The part-time job of tutoring was the one held by the least number of students.
3. The part-time jobs that employed 10 or more students were in the fast food, delivery, lawn maintenance, and grocery store businesses.
4. Yes, it's possible to create a table of values for the bar graph.
Number of Students
5. The percentage of the students who worked as a delivery person was approximately 19.4%.
An extension of the bar graph is the histogram. A histogram is a type of vertical bar graph in which the bars represent grouped continuous data. While there are similarities between a bar graph and a histogram, such as each bar being the same width, a histogram has no spaces between the bars. The quantitative data is grouped according to a determined bin size, or interval. The bin size refers to the width of each bar, and the data is placed in the appropriate bin.
The bins, or groups of data, are plotted on the x-axis, and the frequencies of the bins are plotted on the y-axis. A grouped frequency distribution is constructed for the numerical data, and this table is used to create the histogram. In most cases, the grouped frequency distribution is designed so there are no breaks in the intervals. The last value of one bin is actually the first value counted in the next bin. This means that if you had groups of data with a bin size of 10, the bins would be represented by the notation [0-10), [10-20), [20-30), etc. Each bin appears to contain 11 values, which is 1 more than the desired bin size of 10. Therefore, the last digit of each bin is counted as the first digit of the following bin.
The first bin includes the values 0 through 9, and the next bin includes the values 9 through 19. This makes the bins the proper size. Bin sizes are written in this manner to simplify the process of grouping the data. The first bin can begin with the smallest number of the data set and end with the value determined by adding the bin width to this value, or the bin can begin with a reasonable value that is smaller than the smallest data value.
Construct a frequency distribution table with a bin size of 10 for the following data, which represents the ages of 30 lottery winners:
Step 1: Determine the range of the data by subtracting the smallest value from the largest value.
Step 2: Divide the range by the bin size to ensure that you have at least 5 groups of data. A histogram should have from 5 to 10 bins to make it meaningful: 5510=5.5≈6. Since you cannot have 0.5 of a bin, the result indicates that you will have at least 6 bins.
Step 3: Construct the table.
Step 4: Determine the sum of the frequency column to ensure that all the data has been grouped.
When data is grouped in a frequency distribution table, the actual data values are lost. The table indicates how many values are in each group, but it doesn't show the actual values.
There are many different ways to create a distribution table and many different distribution tables that can be created. However, for the purpose of constructing a histogram, the method shown works very well, and it is not difficult to complete. When the number of data values is very large, another column is often inserted in the distribution table. This column is a tally column, and it is used to account for the number of values within a bin. A tally column facilitates the creation of the distribution table and usually allows the task to be completed more quickly.
The numbers of years of service for 75 teachers in a small town are listed below:
Using the above data, construct a frequency distribution table with a bin size of 5.
You will have 7 bins.
For each value that is in a bin, draw a stroke in the Tally column. To make counting the strokes easier, draw 4 strokes and cross them out with the fifth stroke. This process bundles the strokes in groups of 5, and the frequency can be readily determined.
Now that you have constructed the frequency table, the grouped data can be used to draw a histogram. Like a bar graph, a histogram requires a title and properly labeled x- and y-axes.
Use the data from Example 17 that displays the ages of the lottery winners to construct a histogram. The data is shown again below:
Use the data as it is represented in the distribution table to construct the histogram.
From looking at the tops of the bars, you can see how many winners were in each category, and by adding these numbers, you can determine the total number of winners. You can also determine how many winners were within a specific category. For example, you can see that 8 winners were 60 years of age or older. The graph can also be used to determine percentages. For example, it can answer the question, “What percentage of the winners were 50 years of age or older?” as follows:
a) Use the data and the distribution table that represent the ages of teachers from Example 18 to construct a histogram to display the data. The distribution table is shown again below:
b) Now use the histogram to answer the following questions.
i) How many teachers teach in this small town?
ii) How many teachers have worked for less than 5 years?
iii) If teachers are able to retire when they have taught for 30 years or more, how many are eligible to retire?
iv) What percentage of the teachers still have to teach for 10 years or fewer before they are eligible to retire?
v) Do you think that the majority of the teachers are young or old? Justify your answer.
b) i) 11+9+12+14+7+10+12=75
In this small town, 75 teachers are teaching.
ii) 11 teachers have taught for less than 5 years.
iii) 12 teachers are eligible to retire.
Approximately 23% of the teachers must teach for 10 years or fewer before they are eligible to retire.
v) Answers will vary, but one possible answer is that the majority of the teachers are young, because 46 have taught for less than 20 years.
Technology can also be used to plot a histogram. The TI-83 can be used to create a histogram by using STAT and STAT PLOT on the calculator.
Scientists have invented a new dietary supplement that is supposed to increase the weight of a piglet within its first 3 months of growth. Farmer John fed this supplement to his stock of piglets, and at the end of 3 months, he recorded the weights of 50 randomly selected piglets.
The following table is the recorded weights (in pounds) of the 50 selected piglets:
Using the above data set and the TI-83, construct a histogram to represent the data.
Using the TRACE feature will give you information about the data in each bar of the histogram.
The TRACE feature tells you that in the first bin, which is [60-70), there are 4 values.
The TRACE feature tells you that in the second bin, which is [70-80), there are 6 values.
To advance to the next bin, or bar, of the histogram, use the cursor and move to the right. The information obtained by using the TRACE feature will enable you to create a frequency table and to draw the histogram on paper.
The shape of a histogram can tell you a lot about the distribution of the data, as well as provide you with information about the mean, median, and mode of the data set. The following are some typical histograms, with a caption below each one explaining the distribution of the data, as well as the characteristics of the mean, median, and mode. Distributions can have other shapes besides the ones shown below, but these represent the most common ones that you will see when analyzing data. In each of the graphs below, the distributions are not perfectly shaped, but are shaped enough to identify an overall pattern.
Figure a represents a bell-shaped distribution, which has a single peak and tapers off to both the left and to the right of the peak. The shape appears to be symmetric about the center of the histogram. The single peak indicates that the distribution is unimodal. The highest peak of the histogram represents the location of the mode of the data set. The mode is the data value that occurs the most often in a data set. For a symmetric histogram, the values of the mean, median, and mode are all the same and are all located at the center of the distribution.
Figure b represents a distribution that is approximately uniform and forms a rectangular, flat shape. The frequency of each class is approximately the same.
Figure c represents a right-skewed distribution, which has a peak to the left of the distribution and data values that taper off to the right. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the right, the mean is located to the right on the distribution and is the largest value of the measures of central tendency. The mean has the largest value because it is strongly affected by the outliers on the right tail that pull the mean to the right. The mode is the smallest value, and it is located to the left on the distribution. The mode always occurs at the highest point of the peak. The median is located between the mode and the mean.
Figure d represents a left-skewed distribution, which has a peak to the right of the distribution and data values that taper off to the left. This distribution has a single peak and is also unimodal. For a histogram that is skewed to the left, the mean is located to the left on the distribution and is the smallest value of the measures of central tendency. The mean has the smallest value because it is strongly affected by the outliers on the left tail that pull the mean to the left. The median is located between the mode and the mean.
Figure e has no shape that can be defined. The only defining characteristic about this distribution is that it has 2 peaks of the same height. This means that the distribution is bimodal.
Another type of graph that can be drawn to represent the same set of data as a histogram represents is a frequency polygon. A frequency polygon is a graph constructed by using lines to join the midpoints of each interval, or bin. The heights of the points represent the frequencies. A frequency polygon can be created from the histogram or by calculating the midpoints of the bins from the frequency distribution table. The midpoint of a bin is calculated by adding the upper and lower boundary values of the bin and dividing the sum by 2.
The following histogram represents the marks made by 40 students on a math 10 test.
Use the histogram to construct a frequency polygon to represent the data.
There is no data value greater than 0 and less than 20. The jagged line that is inserted on the x-axis is used to represent this fact. The area under the frequency polygon is the same as the area under the histogram and is, therefore, equal to the frequency values that would be displayed in a distribution table. The frequency polygon also shows the shape of the distribution of the data, and in this case, it resembles a bell curve.
The following distribution table represents the number of miles run by 20 randomly selected runners during a recent road race:
Using this table, construct a frequency polygon.
Step 1: Calculate the midpoint of each bin by adding the 2 numbers of the interval and dividing the sum by 2.
Step 2: Plot the midpoints on a grid, making sure to number the x-axis with a scale that will include the bin sizes. Join the plotted midpoints with lines.
A frequency polygon usually extends 1 unit below the smallest bin value and 1 unit beyond the greatest bin value. This extension gives the frequency polygon an appearance of having a starting point and an ending point, which provides a view of the distribution of data. If the data set were very large so that the number of bins had to be increased and the bin size decreased, the frequency polygon would appear as a smooth curve.
In this lesson, you learned how to represent data that was presented in various forms. Data that could be represented as percentages was displayed in a pie chart, or circle graph. Discrete data that was qualitative was displayed on a bar graph. Finally, continuous data that was grouped was graphed on a histogram or on a frequency polygon. You also learned to detect characteristics of a distribution by simply observing the shape of a histogram. Once again, technology was shown to be an asset when constructing a histogram.
Points to Consider
Can any of these graphs be used for comparing data?
Can these graphs be used to display solutions to problems in everyday life?
How do these graphs compare to ones presented in previous lessons?
Please Sign In to create your own Highlights / Notes