<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

# Scatter Plots

## Identify positive, negative, and no correlation relations

Estimated8 minsto complete
%
Progress
Practice Scatter Plots
Progress
Estimated8 minsto complete
%
Creating Scatter Plots and Line Graphs

#### Objective

Here you will learn how to take raw or organized bivariate data and present it in a visual format with a scatter plot or line plot.

#### Concept

Scott’s teacher was reviewing the research that Scott had conducted regarding the best car stereo systems to buy for college students on a limited budget. Scott had two long columns of numbers indicating the comparison between sound quality (which Scott had summarized with a 10-point scale for each stereo), and cost rounded to the nearest dollar before tax.

The teacher commended Scott on the detailed research, but pointed out that the list of numbers was kind of hard to make sense out of. He suggested that Scott plot the values on a scatter plot or line graph to see if there was a ‘sweet spot’ indicating the best compromise between quality and cost.

How should Scott decide which type of plot is best for his purpose? How would he go about taking the data from columnar form and converting it into the data visualization he decides to use?

#### Watch This

http://youtu.be/TZGOIeKp0fc MathMasterD – Line Plot Graph Tutorial Video

#### Guidance

Line plots, followed closely by scatter plots, are by far the most common method of displaying bivariate data . By assigning one variable to each axis and plotting points by both horizontal and vertical location simultaneously, you can quickly and easily show the degree to which one set of data is influenced (or not influenced) by another.

There are two general types of bivariate data sets that are graphed on a line or scatter plot: observed (or experimental) data and calculated (or predicted) data.

• Calculated Data: To create a line or scatter plot of calculated data, you must first identify your two variables as either dependent or independent . A dependent (or input) variable may also be referred to as the explanatory variable, and has values that are assigned to it. An independent (or output) variable may also be called the response variable, and has values that are the result of computations performed on the input variable. By convention, the independent variable is plotted on the horizontal, and the dependent variable is plotted on the vertical.
• Observed Data: The most common reason to graph two sets of data on the same graph is to evaluate the level of statistical correlation . By plotting the two sets of data on separate axes of the same graph, we can see a visual representation of possible related changes in values between the two sets. As with calculated data, you should plot the values of the variable that you expect is the explanatory variable on the horizontal and the expected response variable on the vertical.

When graphing observed data, you do not always know which value is the input and which the output, or even if the two values are indeed dependent at all!  In later lessons, we will return to this concept to learn a number of methods to evaluate data and determine the degree of correlation between multiple value sets. For now, place the variable you think is most reasonably the input on the horizontal.

The first and most important step is to organize your data so that it is easy to see how a given input value relates to a given output value. By convention this is done with a ‘T’ chart or a two-column graph, with the input value on the left and the output value on the right, or vertically with the input on the top and output on the bottom.

Once you have the table constructed, start with the first pair of values and move across your horizontal axis to the first input value and up the vertical axis to the associated output value. Continue the process until all of your points have been graphed.

Once all of your points have been plotted, if you are creating a scatter plot, you’re done!  If you are creating a line plot, start at your minimum input value and connect the points as you move to the right on the input axis.

Example A

Construct a scatter plot from the given values.

 Input 1 3 5 7 9 11 13 15 17 Output 2 4 6 8 10 12 14 16 18

Solution: The data here is already organized into associated input and output values, so you simply need to create a graph with a horizontal and vertical axis on which to plot the points.

Notice that I have only created the positive values here, since the table of values was all positive.

Now we just plot the points from the table, starting with the first vertical pair: Input = 1, Output = 2. Incidentally, when describing a single point of bivariate data, the conventional method of writing it is in the form (input, output) or $(x, y)$ . So our first point would be (1, 2), the second would be (3, 4) and so on.

Now we fill in the values on the graph, starting with (1, 2). Beginning at the lower-left corner, which represents (0, 0), move 1 point to the right and 2 points up. The second point is 3 points to the right and 4 points up. Continue until all 10 points are graphed. Since the question asks specifically for a scatter plot, once the individual points are plotted, we are done.

Example B

Romane loves jellybeans, and she eats an average of 20 each day. Worried about her weight, she decides to see if there is an obvious correlation between the number of jellybeans she eats and her weight. If she records the data below, which variable would be the input and which the output? Create a line plot from the data.

 Increase or Decrease # of beans Increase or Decrease in Weight +2 -3 -3 +5 -7 +3 +5 +5 -12 -4 +18 +6 -14 -2 -15 +4 +17 +3 +0 +3 -4 -4 +3 +6 +9 -1 +10 +3

Solution: Since Romane can control the number of jellybeans she eats, that would be the input variable, and the increase or decrease in weight would be the output. If we create an  $(x, y)$ graph and plot the points, the result looks like this (note that this time the graph shows negative and positive values!):

Finally we connect the points from left to right, since the question specified a line graph:

Example C

Does more sleep consistently improve math grades?

Organize the data below by creating a ‘T’ table for the  $x$ and  $y$ values, then graph the data as either a scatter plot or line graph, whichever is most appropriate.

 Math Homework Score (out of 20 points) Hours of Sleep (the night before) Day 1 11 Day 1 7 Day 2 19 Day 2 8 Day 3 9.5 Day 3 6.5 Day 4 11 Day 4 7 Day 5 15 Day 5 5 Day 6 6 Day 6 3 Day 7 11.5 Day 7 8.5 Day 8 18.5 Day 8 7.5 Day 9 14 Day 9 6 Day 10 18 Day 10 8 Day 11 15.5 Day 11 5.5 Day 12 15.5 Day 12 4.5 Day 13 12 Day 13 9 Day 14 15.5 Day 14 5.5

Solution: The data as given is organized by day in each table. In order to answer the question: “Does more sleep consistently improve math grades?”  We need to correlate the data from each night’s sleep with the next day’s grades. In this case, which day it is does not matter as much as the hours of sleep the night before, so we can pull the ‘Day’ column out of each table (being careful not to change the order of the values!) and make a new table with only the correlating scores and hours of sleep. This gives us:

 Hours of Sleep Math Score 7 11 8 19 6.5 9.5 7 11 5 15 3 6 8.5 11.5 7.5 18.5 6 14 8 18 5.5 15.5 4.5 15.5 9 12 5.5 15.5

Since the question is asking about the correlation between sleep the night before and grade the next day, the sleep becomes the input variable (or independent variable ) and score becomes the output variable (dependent variable) . Plotting the points on an  $(x, y)$ graph yields:

Given the significant scattering of points as we move left to right, it is appropriate to maintain the scatter plot layout.

In later lessons we will discuss linear regression, the process of identifying a line of best fit . A line of fit is a line drawn through a scatter plot that indicates a trend that the data follows, and the line of best fit is the mathematically derived most accurate indicator of that trend.

Concept Problem Revisited

Scott’s teacher was reviewing the research that Scott had conducted regarding the best car stereo systems to buy for college students on a limited budget. Scott had two long columns of numbers indicating the comparison between sound quality (which Scott had summarized with a 10-point scale for each stereo), and cost rounded to the nearest dollar before tax.

The teacher commended Scott on the detailed research, but pointed out that the list of numbers was kind of hard to make sense out of. He suggested that Scott plot the values on a scatter plot or line graph to see if there was a ‘sweet spot’ indicating the best compromise between quality and cost. How should Scott decide which type of plot is best for his purpose? How would he go about taking the data from columnar form and converting it into the data visualization he decides to use?

Scott should view cost as the independent variable (the input) and sound quality as the dependent variable (the output). By creating an  $(x, y)$ graph and plotting each of the corresponding cost/quality values, he will generate a much clearer comparison for the audience of his report. If he notes a particularly clear correlation (all or most of the points in a line) between increasing cost and improved quality, he may wish to indicate a line of fit to further illustrate the trade-off. A graph of his data might look something like this:

#### Vocabulary

Bivariate data is data composed of two changing sets of values. Distance travelled over differing time intervals, or weight compared to height, or income related to age are all examples of bivariate data.

Dependent variables , also called output variables or response variables and commonly represented by ‘ $y$ ’, have values that depend on the value of another variable.

Independent variables, also called input variables or explanatory variables and commonly represented by ‘ $x$ ’, have values that are not determined by another variable.

A line of fit is a straight or continuously curved line representing the trend of changes in the comparison of two data sets (or one set of bivariate data). The line of best fit is the mathematically calculated most accurate depiction of the trend(s). Note that a line of fit need not be straight, or even continuous.

Linear regression is the process of identifying a line of fit or the line of best fit for a given function.

#### Guided Practice

1. Construct a scatter plot to represent the data from the chart below indicating the number of birds killed by planes each year.

 YEAR PLANES REGISTERED BIRDS KILLED 1978 6 13 1979 4 12 1980 7 14 1981 3 11 1982 7 14 1983 6 13 1984 3 12 1985 4 11 1986 1 9 1987 4 12

2. Construct a line graph to illustrate the data.

 Height Change 4 2 8 13 16 Weight Change 3 3 3 4 3

3. Mike decided to see if the teenage drivers in his city were truly more likely to get into accidents, and he collected the data below. Graph the data appropriately for his study.

 Year Teen Drivers Registered Auto Accidents 1980 90 80 1981 70 90 1982 60 90 1983 20 100 1984 30 90 1985 100 90 1986 70 100 1987 60 100 1988 70 100 1989 90 90

4. How do you determine which values to graph on the vertical axis and which on the horizontal?

5. Given the data below, which variable represents the explanatory variable? What is the related term for the other variable? How do you know which is which?

 Sneezes 8 6 5 2 1 4 7 12 Tissues 18 13 7 5 1 9 19 43

Solutions:

1. The number of planes registered is the input, and the number of birds killed is the output.

2. Change in height is the input, and change in weight the output. Connect the points to create a line graph instead of a scatter plot.

3. The number of teens registering to drive is the input, and the number of reported accidents the output. Note that the very little vertical change despite a significant horizontal change would indicate virtually no correlation between the two values.

4. The input value or the cause is the independent variable, and the output or the effect is the dependent variable. The independent variable goes on the horizontal and the dependent on the vertical.

5. Sneezes are the input variable, graphed on the horizontal ‘ $X$ ’ axis, since sneezing is the cause related to the effect of using a tissue.

#### Practice

1. Create a scatter plot of the data shown below. Describe the relationship that exists within the data.

 Child's Age 3 6 9 12 15 Annual Cost 11, 800 12, 800 13, 700 16, 000 17, 800

2. Create a scatter plot from the data in the table below.

 $X$ -3 -3 -2 -1 0 0 0 1 1 1 2 3 $Y$ 3 4 3 2 1 2 0 0 -1 -1 -3 -4

3. Draw a reasonable line of fit.

4. What is the equation of the line of fit?

5. The data below shows the number of hours spent studying for a history quiz. Draw a scatter plot.

 Study in Hours 4 3 6 2 1 5 4 Grade in Percent 85 78 93 71 61 91 76

6. Draw a reasonable line of fit.

7. What is the equation of the line of best fit?

8. Predict the grade for a student who studied 7 hours.

9. Could the line go on forever? Why or why not?

10. The table below shows the number of reported food poisoning cases at a local hospital. Create a scatter plot for the data.

 Year 2005 2006 2007 2008 2009 Cases 38 26 19 15 17

11. What relationship, if any exists in the data?

12. Draw a line of fit. Write the slope intercept form of an equation for the line of fit.

13. The table shows the average and maximum lifespan of animals that are kept in captivity. Create a scatter plot to represent the data.

 Average 13 26 16 9 36 41 42 21 Maximum 48 51 41 21 71 78 62 55

14. Draw a reasonable line of fit, and write the slope intercept form of the equation.

15. Predict the maximum lifespan for an animal with an average age of 33 years.

### Vocabulary Language: English

correlation

correlation

Correlation is a statistical method used to determine if there is a connection or a relationship between two sets of data.
line of best fit

line of best fit

A line of best fit is a straight line drawn on a scatter plot such that the sums of the distances to the points on either side of the line are approximately equal and such that there are an equal number of points above and below the line.
scatter plot

scatter plot

A scatter plot is a plot of the dependent variable versus the independent variable and is used to investigate whether or not there is a relationship or connection between 2 sets of data.