<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

# Displaying Bivariate Data

## Describing the relationship between two variables from a scatterplot

0%
Progress
Practice Displaying Bivariate Data
Progress
0%
Displaying Bivariate Data

In this Concept, you will be introduced to using a scatterplot and a line graph to show the relationship between two variables.

### Watch This

For a description of how to make a scatter plot on a TI-84 (14.0) , see maysterchief , Scatter Plots on TI-84 (3:55).

### Guidance

Scatterplots and Line Plots

Bivariate simply means two variables. All our previous work was with univariate, or single-variable data. The goal of examining bivariate data is usually to show some sort of relationship or association between the two variables.

#### Example A

We have looked at recycling rates for paper packaging and glass. It would be interesting to see if there is a predictable relationship between the percentages of each material that a country recycles. Following is a data table that includes both percentages.

Country % of Paper Packaging Recycled % of Glass Packaging Recycled
Estonia 34 64
New Zealand 40 72
Poland 40 27
Cyprus 42 4
Portugal 56 39
United States 59 21
Italy 62 56
Spain 63 41
Australia 66 44
Greece 70 34
Finland 70 56
Ireland 70 55
Netherlands 70 76
Sweden 70 100
France 76 59
Germany 83 81
Austria 83 44
Belgium 83 98
Japan 98 96

Figure: Paper and Glass Packaging Recycling Rates for 19 countries

Scatterplots

We will place the paper recycling rates on the horizontal axis and those for glass on the vertical axis. Next, we will plot a point that shows each country's rate of recycling for the two materials. This series of disconnected points is referred to as a scatterplot .

Recall that one of the things you saw from the stem-and-leaf plot is that, in general, a country's recycling rate for glass is lower than its paper recycling rate. On the next graph, we have plotted a line that represents the paper and glass recycling rates being equal. If all the countries had the same paper and glass recycling rates, each point in the scatterplot would be on the line. Because most of the points are actually below this line, you can see that the glass rate is lower than would be expected if they were similar.

With univariate data, we initially characterize a data set by describing its shape, center, and spread. For bivariate data, we will also discuss three important characteristics: shape, direction, and strength. These characteristics will inform us about the association between the two variables. The easiest way to describe these traits for this scatterplot is to think of the data as a cloud. If you draw an ellipse around the data, the general trend is that the ellipse is rising from left to right.

Data that are oriented in this manner are said to have a positive linear association . That is, as one variable increases, the other variable also increases. In this example, it is mostly true that countries with higher paper recycling rates have higher glass recycling rates. Lines that rise in this direction have a positive slope, and lines that trend downward from left to right have a negative slope. If the ellipse cloud were trending down in this manner, we would say the data had a negative linear association . For example, we might expect this type of relationship if we graphed a country's glass recycling rate with the percentage of glass that ends up in a landfill. As the recycling rate increases, the landfill percentage would have to decrease.

The ellipse cloud also gives us some information about the strength of the linear association. If there were a strong linear relationship between the glass and paper recycling rates, the cloud of data would be much longer than it is wide. Long and narrow ellipses mean a strong linear association, while shorter and wider ones show a weaker linear relationship. In this example, there are some countries for which the glass and paper recycling rates do not seem to be related.

New Zealand, Estonia, and Sweden (circled in yellow) have much lower paper recycling rates than their glass recycling rates, and Austria (circled in green) is an example of a country with a much lower glass recycling rate than its paper recycling rate. These data points are spread away from the rest of the data enough to make the ellipse much wider, weakening the association between the variables.

On the Web

http://tinyurl.com/y8vcm5y Guess the correlation.

Line Plots

#### Example B

The following data set shows the change in the total amount of municipal waste generated in the United States during the 1990's:

Year Municipal Waste Generated (Millions of Tons)
1990 269
1991 294
1992 281
1993 292
1994 307
1995 323
1996 327
1997 327
1998 340

Figure: Total Municipal Waste Generated in the US by Year in Millions of Tons. Source: http://www.zerowasteamerica.org/MunicipalWasteManagementReport1998.htm

In this example, the time in years is considered the explanatory variable , or independent variable, and the amount of municipal waste is the response variable , or dependent variable. It is not only the passage of time that causes our waste to increase. Other factors, such as population growth, economic conditions, and societal habits and attitudes also contribute as causes. However, it would not make sense to view the relationship between time and municipal waste in the opposite direction.

When one of the variables is time, it will almost always be the explanatory variable. Because time is a continuous variable, and we are very often interested in the change a variable exhibits over a period of time, there is some meaning to the connection between the points in a plot involving time as an explanatory variable. In this case, we use a line plot. A line plot is simply a scatterplot in which we connect successive chronological observations with a line segment to give more information about how the data values are changing over a period of time. Here is the line plot for the US Municipal Waste data:

Interpreting Graphs for Bivariate Data

It is easy to see general trends from scatter plots or line plots. For Example B, we can spot the year in which the most dramatic increase occurred (1990) by looking at the steepest line. We can also spot the years in which the waste output decreased and/or remained about the same (1991 and 1996). It would be interesting to investigate some possible reasons for the behaviors of these individual years.

Let's look at another example to see if we can interpret the trend.

#### Example C

Following is a scatterplot of the number of lives births per 10,000 23-year-old women in the United States between 1917 and 1975. Comment on the pattern this shows of birthrate over time.

Solution:

Birthrate, over time, appears to be cyclic. There was a dip in birthrate in 1932, then a gradual increase to a high in 1956. After that there was a drop in the birthrate.

### Vocabulary

Bivariate data can be represented using a scatterplot to show what, if any, association there is between the two variables. Usually one of the variables, the explanatory (independent) variable , can be identified as having an impact on the value of the other variable, the response (dependent) variable . The explanatory variable should be placed on the horizontal axis, and the response variable should be on the vertical axis. Each point is plotted individually on a scatterplot. If there is an association between the two variables, it can be identified as being strong if the points form a very distinct shape with little variation from that shape in the individual points. It can be identified as being weak if the points appear more randomly scattered. If the values of the response variable generally increase as the values of the explanatory variable increase, the data have a positive association . If the response variable generally decreases as the explanatory variable increases, the data have a negative association . In a line graph , there is significance to the change between consecutive points, so these points are connected. Line graphs are often used when the explanatory variable is time.

### Guided Practice

Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcoholic beverages. Following is the data gathered.

Region Alcohol Tobacco
North 6.47 4.03
Yorkshire 6.13 3.76
Northeast 6.19 3.77
East Midlands 4.89 3.34
West Midlands 5.63 3.47
East Anglia 4.52 2.92
Southeast 5.89 3.20
Southwest 4.79 2.71
Wales 5.27 3.53
Scotland 6.08 4.51
No. Ireland 4.02 4.56

Use the Technology Notes at the end of this Concept to make a scatter plot of this data. Comment on the relationship between household spending on alcohol and tobacco products.

Solution:

Here is what the image on your graphing calculator should look like for your scatter plot:

It appears that household spending on alcohol productions and household spending on tobacco products are directly related. That is, as one goes up, the other goes up.

### Practice

For 1-4, remember a previous practice problem where you looked at the percentage of waste recycled in each state. Do you think there is a relationship between the percentage recycled and the total amount of waste that a state generates? Here are the data, including both variables.

State Percentage Total Amount of Municipal Waste in Thousands of Tons
Alabama 23 5549
Arizona 18 5700
Arkansas 36 4287
California 30 45000
Connecticut 23 2950
Delaware 31 1189
District of Columbia 8 246
Florida 40 23617
Georgia 33 14645
Hawaii 25 2125
Illinois 28 13386
Indiana 23 7171
Iowa 32 3462
Kansas 11 4250
Kentucky 28 4418
Louisiana 14 3894
Maine 41 1339
Maryland 29 5329
Massachusetts 33 7160
Michigan 25 13500
Minnesota 42 4780
Mississippi 13 2360
Missouri 33 7896
Montana 5 1039
New Hampshire 25 1200
New Jersey 45 8200
New Mexico 12 1400
New York 39 28800
North Carolina 26 9843
North Dakota 21 510
Ohio 19 12339
Oklahoma 12 2500
Oregon 28 3836
Pennsylvania 26 9440
Rhode Island 23 477
South Carolina 34 8361
South Dakota 42 510
Tennessee 40 9496
Utah 19 3760
Vermont 30 600
Virginia 35 9000
Washington 48 6527
West Virginia 20 2000
Wisconsin 36 3622
Wyoming 5 530
1. Identify the variables in this example, and specify which one is the explanatory variable and which one is the response variable.
2. How much municipal waste was created in Illinois?
3. Draw a scatterplot for this data.
4. Describe the direction and strength of the association between the two variables.

For 5-8, the following line graph shows the recycling rates of two different types of plastic bottles in the US from 1995 to

2001.

1. Explain the general trends for both types of plastics over these years.
2. What was the total change in PET bottle recycling from 1995 to 2001?
3. Can you think of a reason to explain this change?

References

National Geographic, January 2008. Volume 213 No.1

1. Which plots are most useful to interpret the ideas of shape, center, and spread?
2. What effects does the shape of a data set have on the statistical measures of center and spread?

Technology Notes:

Scatterplots on the TI-83/84 Graphing Calculator

Press [STAT][ENTER] , and enter the following data, with the explanatory variable in L1 and the response variable in L2 . (Note that this data set contains 18 points- not all are visible on the screen at once). Next, press [2ND][STAT-PLOT] to enter the STAT-PLOTS menu, and choose the first plot.

Change the settings to match the following screenshot:

This selects a scatterplot with the explanatory variable in L1 and the response variable in L2 . In order to see the points better, you should choose either the square or the plus sign for the mark. The square has been chosen in the screenshot. Finally, set the window as shown below to match the data. In this case, we looked at our lowest and highest data values in each variable and added a bit of room to create a pleasant window. Press [GRAPH] to see the result, shown below.

Line Plots on the TI-83/84 Graphing Calculator

Your graphing calculator will also draw a line plot, and the process is almost identical to that for creating a scatterplot. Enter the data into your lists, and choose a line plot in the Plot1 menu, as in the following screenshot.

Next, set an appropriate window (not necessarily the one shown below), and graph the resulting plot.

### Vocabulary Language: English

explanatory (independent) variable

explanatory (independent) variable

The explanatory (independent) variable, can be identified as having an impact on the value of the other variable.
Line Graph

Line Graph

A line graph is a visual way to show how data changes over time.
negative association

negative association

If the response variable generally decreases as the explanatory variable increases, the data have a negative association    .
positive association

positive association

If the values of the response variable generally increase as the values of the explanatory variable increase, the data have a positive association.
response (dependent) variable

response (dependent) variable

The response variable is also called the dependent variable.
scatter plot

scatter plot

A scatter plot is a plot of the dependent variable versus the independent variable and is used to investigate whether or not there is a relationship or connection between 2 sets of data.