# 6.1: Displaying Bivariate Data

**At Grade**Created by: Bruce DeWItt

### Learning Objectives

- Construct and interpret scatterplots
- Identify explanatory and response variables
- Describe bivariate distributions in context—including strength, outliers, form and direction

### Scatterplots

Scatterplots are graphs that represent a relationship between two variables. Two numerical values are measured about each individual being studied. When these two values become ordered pairs that are graphed on a coordinate plane, the resulting graph is called a **scatterplot**. We often suspect that one of these variables might explain, cause changes in, or help to predict the other variable. The **explanatory variable** is the variable that we believe may explain or affect the other variable. The explanatory variable is plotted along the x-axis. The **response variable** is the variable we believe may respond to, or be affected by, the other variable. The response variable is plotted along the y-axis. The explanatory variable is often referred to as the independent variable and the response variable is referred to as the dependent variable. Even though we often look for an explanatory-response relationship between the two variables, we can create a scatterplot even if no such relationship exists.

#### Example 1

State whether or not you suspect that there will be an explanatory-response relationship between each of the following pairs of data. If yes, identify the explanatory and response variables.

a) A college professor decided to examine whether or not there is a relationship between the amount of time that a student studies and his or her score on the mid-term exam. At the end of the exam each student was asked to record the number of hours he or she had spent studying for the mid-term. The professor then made a scatterplot to examine the data.

b) A different professor wanted to see whether or not there is an association between her students’ heights and their IQ scores. She gave each of her students an IQ test and had her TA (teaching assistant) measure each student’s height to the nearest inch. She constructed a scatterplot to examine the data.

#### Solution

a)

It is reasonable to believe that the amount of studying does somehow have an effect on students’ exam scores. The explanatory variable is hours studying and the response variable is exam score. Often thinking in terms of a cause and effect relationship can help identify which variable is which. As a hint, try to determine if one of the variables comes first. If one comes first, then it is most likely the explanatory variable. In our example, studying should come before the exam.b)

It is not reasonable to believe that there is an association between height and IQ scores. Neither of these variables comes before the other and neither would be useful in predicting the other. However, even though we do not believe that there is an explanatory-response relationship between these variables, we can still construct a scatterplot

#### Example 2

The following table reports the recycling rates for paper packaging and glass for several individual countries. It would be interesting to see if there is a predictable relationship between the percentages of each material that countries recycle. Construct a scatter plot to examine the relationship. Treat percentage of paper packaging recycled as the explanatory variable.

#### Solution

We will place the paper recycling rates on the horizontal axis because we are treating it as the explanatory variable. Glass recycling rates are then plotted along the vertical axis. Next, plot a point that shows each country's rate of recycling for the two materials. Be sure to label your axes.

Percent of Paper & Glass Recycled for 19 Countries

Notice that we do not always need to start at zero on either axis when making scatterplots.

### Describing Bivariate Data

When we describe single variable data, we address several characteristics. We used the acronym *S.O.C.C.S.* to help remember to describe the shape, outliers, center and spread of a distribution. And, to be sure to do all of this in the context of the variables and individuals being studied. For bivariate data, we will again be discussing several characteristics in context. The important characteristics to describe when looking at the relationship between two numerical variables will be strength, outliers, form and direction. And, we will do this in the context of the variables and individuals being compared. The acronym that will help us to remember what to include in our descriptions is: **S.C.O.F.D.** (strength, context, outliers, form and direction).

When looking at a scatterplot, it is helpful to imagine drawing a line-of-best-fit through the data. A l**ine-of-best-fit** is a line that follows the trend of the data. It may go through some, all, or none of the actual points on the scatterplot. Do not actually draw such a line on your plot- just try to determine whether or not such a line would make sense, and if so, where it would fit. As you observe a scatterplot and imagine drawing such a line, you can ask yourself questions such as: *How close to a line do the points lie? Would a curved pattern fit better? Are there points that would be far away from the line? Would the line have a positive or negative slope? etc.*

### Strength

Once you have constructed a scatterplot, you can examine the strength of the relationship between the two variables. The **strength** refers to how closely the points form a pattern. The more closely the points fit a pattern, the stronger the relationship between the variables. The more spread out and scattered the points are, the weaker the relationship. The first plot shows an extremely strong, linear pattern because the points form an obvious line. The second plot is more scattered so it is only moderately strong. And, the third plot does not show much of a pattern at all, so it is moderately to very weak. Keep in mind that the association may be very strong, but not linear. We could find a very clear curved pattern in the data, for example. In the next section of this book we will learn about a statistic, called correlation, that measures the strength of the linear relationship between two variables.

In example #2, the relationship between paper and glass recycling rates for these countries is very weak.

### Context

Do not forget that the graph, the numbers and equations, and the descriptions are all about something-its **context**. All of these elements should be described in the context of the variables and the individuals being examined.These graphs and statistics are not meaningless, they are about something!

In example #2, the scatterplot explores the relationship between glass and paper recycling rates for several countries.

### Outliers

When examining a scatterplot, look for any data values that do not fit the pattern, or points that stand out from the rest of the data. An **outlier** will be a point that lies away from the rest of the data or one that seems to affect the strength of the relationship between the two variables. Many outliers will weaken the association between the variables, but they often would not significantly change where a line-of-best-fit would be drawn. An **influential point** is an outlier that actually seems to influence the line-of-best-fit. Imagine what the plot would look like without the point in question. If it would change the strength, then the point is an outlier. If it would change the slope of a line-of-best-fit, or where the line would be drawn, then the point is influential.

In example #2, there seem to be some outliers. For example, Estonia and New Zealand have much lower paper recycling rates than their glass rates. Without these data values, the relationship would be stronger.

### Form

Many scatterplots show a clear **form** or pattern. The first plot below shows a clearly linear pattern or form. It is easy to imagine drawing a line-of-best-fit through these points. The second plot shows a clearly curved form. A line would not make any sense, so this is non-linear. The third plot shows a great deal of scatter among the points, so it has no form whatsoever.

In example #2, the scatterplot for paper and glass recycling rates shows a very weak linear form. The relationship is very weak, but no curved pattern is visible. If the outliers were removed, it would become more linear.

### Direction

The direction of the graph is also important to mention. A graph that goes down to the right has a **negative association.** That is, as the explanatory variable increases, the response variable decreases. The first plot below has a negative relationship between the variables. A graph that goes up to the right has a **positive association**. That is, as the explanatory variable increases, the response variable also increases. The second plot shows a positive relationship between the variables. The third plot is an example of a graph that has neither a positive, nor a negative direction. If the relationship is linear and a line-of-best-fit is added to the graph, the slope of the line will be positive if the association is positive. And, the line will have a negative slope if there is a negative linear association between the two variables.

In example #2, the scatterplot for paper and glass recycling rates shows a positive association. As the paper recycling rate for these countries increases, so does the glass recycling rate.

### S.C.O.F.D

When you describe the relationship between bivariate data there are several characteristics to include. The acronym **S.C.O.F.D.** will help you remember to describe the strength of the relationship, be sure that your description is in context, mention any outliers, and to describe the form and direction of the graph.

#### Example 3

The following example is a scatterplot showing the weights (in pounds) and gas mileage (miles per gallon) for several cars.

a) Identify the explanatory and response variables.

b) Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

#### Solution

a) explanatory variable is:

weight of the cars in poundsresponse variable is:

gas mileage of the cars (mpg)b)

The relationship between these vehicles' weights in pounds and gas mileage (mpg) is strong and very linear. There are no extreme outliers visible in the graph. The association between a vehicle's weight and gas mileage is negative. As the weight of the vehicles increase, the gas mileage of the vehicles decrease.

#### Example 4

The following scatterplot shows the data collected by the professor who wanted to see whether or not there is an association between her students’ heights and their IQ scores. She gave each of her students an IQ test and had her TA measure each student’s height to the nearest inch. Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

#### Solution

There appears to be no relationship between height and IQ scores for these students. The graph has no form and no direction. Therefore, there are no outliers. The relationship has zero strength. There is no pattern or trend between IQ scores and students' heights.

### Problem Set 6.1

#### Section 6.1 Exercises

1) State whether or not you suspect that there will be an explanatory-response relationship between each of the following pairs of data. If yes, identify the explanatory and response variables.

a) The number of semesters that students have been enrolled in college and the number of credits that they have earned.

b) Students' grades on a statistics test and their weights.

c) Employees' annual salary and the number of years that they have been employed by the company.

d) The number of songs each person has on his or her IPod and the number of months that they have owned the IPod.

2) A college professor decided to examine whether or not there is a relationship between the amount of time that a student studies and his or her score on the mid-term exam (out of 100 points possible). At the end of the exam each student was asked to record the number of hours he or she had spent studying for the mid-term. The professor then made a scatterplot to examine the data. Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

3) Malia turned the water on in her bathtub full blast. She then measured the depth of the water every two minutes until the bathtub was full (and her mother started to freak out). Her findings are listed in the following table.

a) Identify the explanatory and response variables for this situation.

b) Construct a scatterplot to show the results.

c) Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

4) Several brands of peanut butter were rated for quality. The following graph compares the price per ounce (in cents) and the quality rating (scale of 0 = lowest to 100 = highest) for each of these brands of peanut butter.

a) Identify the explanatory and response variables for this situation.

b) Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

5) Mr. Exercise wanted to know whether or not customers continued to use their equipment after they purchased it. He contacted an SRS of his customers who had purchased an exercise machine during the past 18 months. His findings are summarized in the following table:

a) Identify the explanatory and response variables for this situation.

b) Construct a scatterplot to show the results.

c) Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

6) The following scatterplot shows the elevation and mean temperature for various locations in Nevada.

a) Identify the explanatory and response variables for this situation.

b) Describe what the scatterplot shows. Be sure to address strength, context, outliers, form and direction (S.C.O.F.D.).

#### Review Exercises

7) If two cards are drawn from a standard deck of playing cards, and laid face up on a table, what is the probability of getting two Queens?

8) A card is drawn from a standard deck. The card is put back, the deck is reshuffled, and another card is drawn. What is the probability of drawing two clubs?

9) A gum ball machine contains 14 pink gumballs, 7 blue, 9 white, and 11 green gumballs. A child buys two gumballs, one after the other. Find the following probabilities:

a) P(blue, then green)

b) P(neither is pink)