In this Concept, you will be introduced to using a scatterplot and a line graph to show the relationship between two variables.
For a description of how to make a scatter plot on a TI-84 (14.0) , see maysterchief , Scatter Plots on TI-84 (3:55).
Scatterplots and Line Plots
Bivariate simply means two variables. All our previous work was with univariate, or single-variable data. The goal of examining bivariate data is usually to show some sort of relationship or association between the two variables.
We have looked at recycling rates for paper packaging and glass. It would be interesting to see if there is a predictable relationship between the percentages of each material that a country recycles. Following is a data table that includes both percentages.
|Country||% of Paper Packaging Recycled||% of Glass Packaging Recycled|
Figure: Paper and Glass Packaging Recycling Rates for 19 countries
We will place the paper recycling rates on the horizontal axis and those for glass on the vertical axis. Next, we will plot a point that shows each country's rate of recycling for the two materials. This series of disconnected points is referred to as a scatterplot .
Recall that one of the things you saw from the stem-and-leaf plot is that, in general, a country's recycling rate for glass is lower than its paper recycling rate. On the next graph, we have plotted a line that represents the paper and glass recycling rates being equal. If all the countries had the same paper and glass recycling rates, each point in the scatterplot would be on the line. Because most of the points are actually below this line, you can see that the glass rate is lower than would be expected if they were similar.
With univariate data, we initially characterize a data set by describing its shape, center, and spread. For bivariate data, we will also discuss three important characteristics: shape, direction, and strength. These characteristics will inform us about the association between the two variables. The easiest way to describe these traits for this scatterplot is to think of the data as a cloud. If you draw an ellipse around the data, the general trend is that the ellipse is rising from left to right.
Data that are oriented in this manner are said to have a positive linear association . That is, as one variable increases, the other variable also increases. In this example, it is mostly true that countries with higher paper recycling rates have higher glass recycling rates. Lines that rise in this direction have a positive slope, and lines that trend downward from left to right have a negative slope. If the ellipse cloud were trending down in this manner, we would say the data had a negative linear association . For example, we might expect this type of relationship if we graphed a country's glass recycling rate with the percentage of glass that ends up in a landfill. As the recycling rate increases, the landfill percentage would have to decrease.
The ellipse cloud also gives us some information about the strength of the linear association. If there were a strong linear relationship between the glass and paper recycling rates, the cloud of data would be much longer than it is wide. Long and narrow ellipses mean a strong linear association, while shorter and wider ones show a weaker linear relationship. In this example, there are some countries for which the glass and paper recycling rates do not seem to be related.
New Zealand, Estonia, and Sweden (circled in yellow) have much lower paper recycling rates than their glass recycling rates, and Austria (circled in green) is an example of a country with a much lower glass recycling rate than its paper recycling rate. These data points are spread away from the rest of the data enough to make the ellipse much wider, weakening the association between the variables.
On the Web
http://tinyurl.com/y8vcm5y Guess the correlation.
The following data set shows the change in the total amount of municipal waste generated in the United States during the 1990's:
|Year||Municipal Waste Generated (Millions of Tons)|
Figure: Total Municipal Waste Generated in the US by Year in Millions of Tons. Source: http://www.zerowasteamerica.org/MunicipalWasteManagementReport1998.htm
In this example, the time in years is considered the explanatory variable , or independent variable, and the amount of municipal waste is the response variable , or dependent variable. It is not only the passage of time that causes our waste to increase. Other factors, such as population growth, economic conditions, and societal habits and attitudes also contribute as causes. However, it would not make sense to view the relationship between time and municipal waste in the opposite direction.
When one of the variables is time, it will almost always be the explanatory variable. Because time is a continuous variable, and we are very often interested in the change a variable exhibits over a period of time, there is some meaning to the connection between the points in a plot involving time as an explanatory variable. In this case, we use a line plot. A line plot is simply a scatterplot in which we connect successive chronological observations with a line segment to give more information about how the data values are changing over a period of time. Here is the line plot for the US Municipal Waste data:
Interpreting Graphs for Bivariate Data
It is easy to see general trends from scatter plots or line plots. For Example B, we can spot the year in which the most dramatic increase occurred (1990) by looking at the steepest line. We can also spot the years in which the waste output decreased and/or remained about the same (1991 and 1996). It would be interesting to investigate some possible reasons for the behaviors of these individual years.
Let's look at another example to see if we can interpret the trend.
Following is a scatterplot of the number of lives births per 10,000 23-year-old women in the United States between 1917 and 1975. Comment on the pattern this shows of birthrate over time.
Birthrate, over time, appears to be cyclic. There was a dip in birthrate in 1932, then a gradual increase to a high in 1956. After that there was a drop in the birthrate.
Bivariate data can be represented using a scatterplot to show what, if any, association there is between the two variables. Usually one of the variables, the explanatory (independent) variable , can be identified as having an impact on the value of the other variable, the response (dependent) variable . The explanatory variable should be placed on the horizontal axis, and the response variable should be on the vertical axis. Each point is plotted individually on a scatterplot. If there is an association between the two variables, it can be identified as being strong if the points form a very distinct shape with little variation from that shape in the individual points. It can be identified as being weak if the points appear more randomly scattered. If the values of the response variable generally increase as the values of the explanatory variable increase, the data have a positive association . If the response variable generally decreases as the explanatory variable increases, the data have a negative association . In a line graph , there is significance to the change between consecutive points, so these points are connected. Line graphs are often used when the explanatory variable is time.
Data from a British government survey of household spending may be used to examine the relationship between household spending on tobacco products and alcoholic beverages. Following is the data gathered.
Use the Technology Notes at the end of this Concept to make a scatter plot of this data. Comment on the relationship between household spending on alcohol and tobacco products.
Here is what the image on your graphing calculator should look like for your scatter plot:
It appears that household spending on alcohol productions and household spending on tobacco products are directly related. That is, as one goes up, the other goes up.
For 1-4, remember a previous practice problem where you looked at the percentage of waste recycled in each state. Do you think there is a relationship between the percentage recycled and the total amount of waste that a state generates? Here are the data, including both variables.
|State||Percentage||Total Amount of Municipal Waste in Thousands of Tons|
|District of Columbia||8||246|
- Identify the variables in this example, and specify which one is the explanatory variable and which one is the response variable.
- How much municipal waste was created in Illinois?
- Draw a scatterplot for this data.
- Describe the direction and strength of the association between the two variables.
For 5-8, the following line graph shows the recycling rates of two different types of plastic bottles in the US from 1995 to
- Explain the general trends for both types of plastics over these years.
- What was the total change in PET bottle recycling from 1995 to 2001?
- Can you think of a reason to explain this change?
- During what years was this change the most rapid?
National Geographic, January 2008. Volume 213 No.1
- Which plots are most useful to interpret the ideas of shape, center, and spread?
- What effects does the shape of a data set have on the statistical measures of center and spread?
Scatterplots on the TI-83/84 Graphing Calculator
Press [STAT][ENTER] , and enter the following data, with the explanatory variable in L1 and the response variable in L2 . (Note that this data set contains 18 points- not all are visible on the screen at once). Next, press [2ND][STAT-PLOT] to enter the STAT-PLOTS menu, and choose the first plot.
Change the settings to match the following screenshot:
This selects a scatterplot with the explanatory variable in L1 and the response variable in L2 . In order to see the points better, you should choose either the square or the plus sign for the mark. The square has been chosen in the screenshot. Finally, set the window as shown below to match the data. In this case, we looked at our lowest and highest data values in each variable and added a bit of room to create a pleasant window. Press [GRAPH] to see the result, shown below.
Line Plots on the TI-83/84 Graphing Calculator
Your graphing calculator will also draw a line plot, and the process is almost identical to that for creating a scatterplot. Enter the data into your lists, and choose a line plot in the Plot1 menu, as in the following screenshot.
Next, set an appropriate window (not necessarily the one shown below), and graph the resulting plot.