<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

Displaying Univariate Data

Examining the distribution of single variable numerical data from dot plots and stem-and-leaf plots

0%
Progress
Practice Displaying Univariate Data
Progress
0%
Displaying Univariate Data

In this Concept, we will investigate the different types of graphs that can be used to represent single numerical variables (univariate data). We will compare the distribution of the data, and look at the effect of outliers.

Watch This

For a description of how to draw a stem-and-leaf plot, as well as how to derive information from one (14.0) , see APUS07, Stem-and-Leaf Plot (8:08).

Guidance

Dot Plots

A dot plot is one of the simplest ways to represent numerical data. After choosing an appropriate scale on the axes, each data point is plotted as a single dot. Multiple points at the same value are stacked on top of each other using equal spacing to help convey the shape and center.

Example A

The following is a data set representing the percentage of paper packaging manufactured from recycled materials for a select group of countries.

Percentage of the paper packaging used in a country that is recycled. Source: National Geographic, January 2008. Volume 213 No.1, pg 86-87.
Country % of Paper Packaging Recycled
Estonia 34
New Zealand 40
Poland 40
Cyprus 42
Portugal 56
United States 59
Italy 62
Spain 63
Australia 66
Greece 70
Finland 70
Ireland 70
Netherlands 70
Sweden 70
France 76
Germany 83
Austria 83
Belgium 83
Japan 98

The dot plot for this data would look like this:

Notice that this data set is centered at a manufacturing rate for using recycled materials of between 65 and 70 percent. It is spread from 34% to 98%, and appears very roughly symmetric, perhaps even slightly skewed left. Dot plots have the advantage of showing all the data points and giving a quick and easy snapshot of the shape, center, and spread. Dot plots are not much help when there is little repetition in the data. They can also be very tedious if you are creating them by hand with large data sets, though computer software can make quick and easy work of creating dot plots from such data sets.

Stem-and-Leaf Plots

One of the shortcomings of dot plots is that they do not show the actual values of the data. You have to read or infer them from the graph. From the previous example, you might have been able to guess that the lowest value is 34%, but you would have to look in the data table itself to know for sure. A stem-and-leaf plot is a similar plot in which it is much easier to read the actual data values. In a stem-and-leaf plot, each data value is represented by two digits: the stem and the leaf. In this example, it makes sense to use the ten's digits for the stems and the one's digits for the leaves. The stems are on the left of a dividing line as follows:

Once the stems are decided, the leaves representing the one's digits are listed in numerical order from left to right:

It is important to explain the meaning of the data in the plot for someone who is viewing it without seeing the original data. For example, you could place the following sentence at the bottom of the chart:

Note: $5|69$ means 56% and 59% are the two values in the 50's.

If you could rotate this plot on its side, you would see the similarities with the dot plot. The general shape and center of the plot is easily found, and we know exactly what each point represents. This plot also shows the slight skewing to the left that we suspected from the dot plot. Stem plots can be difficult to create, depending on the numerical qualities and the spread of the data. If the data values contain more than two digits, you will need to remove some of the information by rounding. A data set that has large gaps between values can also make the stem plot hard to create and less useful when interpreting the data.

Example B

Consider the following populations of counties in California.

Butte - 220,748

Calaveras - 45,987

Del Norte - 29,547

Fresno - 942,298

Humboldt - 132,755

Imperial - 179,254

San Francisco - 845,999

Santa Barbara - 431,312

To construct a stem and leaf plot, we need to first make sure each piece of data has the same number of digits. In our data, we will add a 0 at the beginning of our 5 digit data points so that all data points have six digits. Then, we can either round or truncate all data points to two digits.

Value Value Rounded Value Truncated
149 15 14
657 66 65
188 19 18

$2|2$ represents $220,000 - 229,999$ when data has been truncated

$2|2$ represents $215,000 - 224,999$ when data has been rounded.

If we decide to round the above data, we have:

Butte - 220,000

Calaveras - 050,000

Del Norte - 030,000

Fresno - 940,000

Humboldt - 130,000

Imperial - 180,000

San Francisco - 850,000

Santa Barbara - 430,000

And the stem and leaf will be as follows:

where:

$2|2$ represents $215,000 - 224,999$ .

Source: California State Association of Counties http://74.205.125.191/default.asp?id=399

Back-to-Back Stem Plots

Stem plots can also be a useful tool for comparing two distributions when placed next to each other. These are commonly called back-to-back stem plots .

Example C

In a previous example, we looked at recycling in paper packaging. Here are the same countries and their percentages of recycled material used to manufacture glass packaging:

Percentage of the glass packaging used in a country that is recycled. Source: National Geographic, January 2008. Volume 213 No.1, pg 86-87.
Country % of Glass Packaging Recycled
Cyprus 4
United States 21
Poland 27
Greece 34
Portugal 39
Spain 41
Australia 44
Ireland 56
Italy 56
Finland 56
France 59
Estonia 64
New Zealand 72
Netherlands 76
Germany 81
Austria 86
Japan 96
Belgium 98
Sweden 100

In a back-to-back stem plot, one of the distributions simply works off the left side of the stems. In this case, the spread of the glass distribution is wider, so we will have to add a few extra stems. Even if there are no data values in a stem, you must include it to preserve the spacing, or you will not get an accurate picture of the shape and spread.

We have already mentioned that the spread was larger in the glass distribution, and it is easy to see this in the comparison plot. You can also see that the glass distribution is more symmetric and is centered lower (around the mid-50's), which seems to indicate that overall, these countries manufacture a smaller percentage of glass from recycled material than they do paper. It is interesting to note in this data set that Sweden actually imports glass from other countries for recycling, so its effective percentage is actually more than 100.

Vocabulary

A dot plot is a convenient way to represent univariate numerical data by plotting individual dots along a single number line to represent each value. They are especially useful in giving a quick impression of the shape, center, and spread of the data set, but are tedious to create by hand when dealing with large data sets.

Stem-and-leaf plots show similar information with the added benefit of showing the actual data values.

Guided Practice

Here are the ages, arranged order, for the CEOs of the 60 top-ranked small companies in America in 1993 http://lib.stat.cmu.edu/DASL/Datafiles/ceodat.html

32, 33, 36, 37, 38, 40, 41, 43, 43, 44, 44, 45, 45, 45, 45,46, 46, 47, 47, 47, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 51, 51, 52, 53, 53, 53, 55, 55, 55, 56, 56, 56, 56, 57, 57, 58, 58, 59, 60, 61, 61, 61, 62, 62, 63, 69, 69, 70, 74

a) Create a stem-and-leaf plot for these ages.

b) Create a dot plot for these ages.

c) Describe the shape of this dataset.

d) Are there any outliers in this dataset?

Solutions:

1. Here is the stem-and-leaf plot:

b. Here is the dot plot:

c. The data set is approximately symmetric with most CEOs in their fifties.

d. There do not appear to be any outliers.

Practice

For 1-4, the following table gives the percentages of municipal waste recycled by state in the United States, including the District of Columbia, in 1998. Data was not available for Idaho or Texas.

State Percentage
Alabama 23
Arizona 18
Arkansas 36
California 30
Connecticut 23
Delaware 31
District of Columbia 8
Florida 40
Georgia 33
Hawaii 25
Illinois 28
Indiana 23
Iowa 32
Kansas 11
Kentucky 28
Louisiana 14
Maine 41
Maryland 29
Massachusetts 33
Michigan 25
Minnesota 42
Mississippi 13
Missouri 33
Montana 5
New Hampshire 25
New Jersey 45
New Mexico 12
New York 39
North Carolina 26
North Dakota 21
Ohio 19
Oklahoma 12
Oregon 28
Pennsylvania 26
Rhode Island 23
South Carolina 34
South Dakota 42
Tennessee 40
Utah 19
Vermont 30
Virginia 35
Washington 48
West Virginia 20
Wisconsin 36
Wyoming 5
1. Create a dot plot for this data.
2. Discuss the shape, center, and spread of this distribution.
3. Create a stem-and-leaf plot for the data.
4. Use your stem-and-leaf plot to find the median percentage for this data.

For 5-8, identify the important features of the shape of the distribution.

For 9-12, refer to the following dot plots:

1. Identify the overall shape of each distribution.
2. How would you characterize the center(s) of these distributions?
3. Which of these distributions has the smallest standard deviation?
4. Which of these distributions has the largest standard deviation?
1. What characteristics of a data set make it easier or harder to represent using dot plots, stem-and-leaf plots, or histograms?
2. Here are the ages, arranged order, for the CEOs of the 60 top-ranked small companies in America in 1993 http://lib.stat.cmu.edu/DASL/Datafiles/ceodat.html 32, 33, 36, 37, 38, 40, 41, 43, 43, 44, 44, 45, 45, 45, 45,46, 46, 47, 47, 47, 48, 48, 48, 48, 49, 50, 50, 50, 50, 50, 50, 51, 51, 52, 53, 53, 53, 55, 55, 55, 56, 56, 56, 56, 57, 57, 58, 58, 59, 60, 61, 61, 61, 62, 62, 63, 69, 69, 70, 74
1. Create a stem-and-leaf plot for these ages.
2. Create a dot plot for these ages.
3. Describe the shape of this dataset.
4. Are there any outliers in this dataset?
3. Give an example in which the same measurement taken on the same individual would be considered to be an outlier in one dataset but not in another dataset.
4. Does a stem and leaf plot provide enough information to determine if there are any outliers in the dataset? Explain.
5. Does a five number summary provide enough information to determine if there are any outliers in the data set? Explain.
6. A set of 17 exam scores is 67, 94, 88, 76, 85, 93, 55, 87, 80, 81, 80, 61, 90 ,84, 75, 93, 75
1. Draw a stem-and-leaf plot of the scores.
2. Draw a dotplot of the scores.
7. Make a stem and leaf plot of the mean high temperature in December (Farenheit) in 15 cities in California. The “stem” gives the first digit of a temperature, while the “leaf” gives the second digit. You can find the data at: http://countrystudies.us/united-states/weather/California/beverly-hills.htm)
1. Describe the shape of the dataset. Is it skewed or is it symmetric?
2. What is the highest temperature in the dataset?
3. What is the lowest temperature in the dataset?
4. What percent of the 15 cities have a mean high December temperature in the 60s?

Vocabulary Language: English

back-to-back stem plots

back-to-back stem plots

A Back-to-Back stem plot is a modified stem-and-leaf plot with the stem in the center and the leaves on the sides, it is used to compare two different related sets of data (bivariate data).
dot plot

dot plot

A dot plot is a convenient way to represent univariate numerical data by plotting individual dots along a single number line to represent each value.
Stem-and-leaf plot

Stem-and-leaf plot

A stem-and-leaf plot is a way of organizing data values from least to greatest using place value. Usually, the last digit of each data value becomes the "leaf" and the other digits become the "stem".
univariate

univariate

Univariate data only has one variable.