<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

Scatter Plots and Linear Correlation

Plot points and estimate the line that best represents them

Estimated6 minsto complete
%
Progress
Practice Scatter Plots and Linear Correlation
Progress
Estimated6 minsto complete
%
Use a Scatterplot to Interpret Data

Have you ever had a coach? Well, being a track and field coach is a tough job. Take a look at this dilemma.

Mr. Watson is trying to evaluate his team and their strengths. He has determined that there is a correlation between speed and the height of the student. He is so sure of it, that he gathered data to support his claim. When looking at the students who run the 800 meters, he gathered the following heights and times.

535254555656=2.26.11=3.01.11=2.23.20=2.20.01=2.18.23=2.18.25

Mr. Watson took this data and created the following scatterplot.

Mr. Watson is sure that there is a positive correlation between speed and height.

Can he prove it with this scatterplot? Pay attention and you will learn how to use a scatterplot to interpret data.

Guidance

In the real world, many things are related to each other. For instance, the more you smoke, the lower your life expectancy. Or the more years you spend in college, the greater your income in the future. Many fields try to find relationships between two variables.

One tool that helps us accomplish this is the scatterplot.

A scatterplot is a type of graph where corresponding values from a set of data are placed as points on a coordinate plane. A relationship between the points is sometimes shown to be positive, negative, strong, or weak.

Sometimes a scatterplot shows that there is no relationship at all. Aside from finding relationships, scatterplots are useful in predicting values based on the relationship that was revealed.

Take a look at this scatterplot.

You can see that there is a relationship between the independent and dependent values of the chart. Let’s look at how we can examine these relationships.

As you may have guessed, the scatterplots are useful because their shapes may indicate a relationship between the variables. Consider the following: what happens to people’s heating bills as the temperature outside goes up? Or, what happens to the gasoline consumption in a vehicle as the miles traveled goes up?

For the first question, you might have thought that as the temperature outside goes up, people’s heating bills go down because they use their heaters less. As one variable goes up, the other goes down. How about the second question? You might imagine that as the miles traveled in a car go up, the amount of gasoline consumed also goes up. So, as one variable goes up, the other goes up, too.

We can say that these variables correlate or are connected together. When we look at a scatterplot, we can determine the different variables and their correlation.

In the situation above, the first illustrates a negative relationship—as one variable goes up, the other goes down. The second illustrates a positive relationship—as one variable goes up, the other goes up, too. Well, what if there is no relationship—while one variable goes up, the other may go up, down, or stay the same; the second variable is independent of the first. This oftentimes occurs, too. This is an example of no relationship. Like the number of blue cars on a given road and the number of accidents. The two variables have no relationship.

These three trends, positive, negative, and no relationship are evident on scatterplots. This is what they look like:

Positive Relationship

As the x\begin{align*}x\end{align*}-values increase, the y\begin{align*}y\end{align*}-values increase. Some points may not follow an exact pattern but the overall trend, the general tendency or movement, is clearly from the lower left to the upper right of the plot.

Negative Relationship

In this case, as the x\begin{align*}x\end{align*}-values increase, the y\begin{align*}y\end{align*}-values decrease. You may argue that the slope is not as steep which is true. However, the general tendency is evident. This graph moves from the upper left to the lower right.

No Relationship

At times, there is no relationship between variables. The scatterplots of these situations will show no trend. In other words, there seems to be no definite pattern with the points; you cannot see any particular direction that they take.

Scatterplots are as useful for finding a relationship between variables as they are for making predictions. Here, we will make a trend line, or a line that best describes the data on a scatterplot, in order to estimate unknown outputs for given inputs.

A trend line is a straight line that best represents the points on a scatterplot. The trend line may go through some points but need not go through them all. The trend line is used to show the pattern of the data. This trend line may show a positive trend or a negative trend. However, if there is no relationship, then no trend line can be adequately drawn.

Your trend line is your best approximation so it may be different from others’.

The line on this graph is the trend line; it is the line that best describes the data. About half of the points should be on either side of the line.

You may notice that outliers are practically ignored when a trend line is drawn. This trend line goes from the lower left to the upper right and shows a positive relationship.

Notice that this trend goes down and indicates a negative correlation or relationship. You could also see that it goes off of the chart. Therefore, we could use a chart like this one to predict the trend. It is likely that the trend will continue to go down.

Use what you have learned to answer these questions.

Example A

If the data does not follow a pattern, what kind of correlation will describe this scatterplot?

Solution: No correlation.

Example B

If the data goes up in a pattern from the bottom left to the top right, what kind of correlation will describe the data?

Solution: Positive correlation.

Example C

If the distance of a car increases as it's speed increases, what kind of correlation will the data have?

Solution: Positive correlation.

Now let's go back to the dilemma from the beginning of the Concept.

The trend of this data shows that as the height of the runner increases, his time decreases meaning that the runner is faster. Therefore, Mr. Watson can prove that there is a positive correlation between height and speed.

Vocabulary

Scatterplot
a graph where corresponding values are placed on the coordinate plane and the relationship between the values can be determined.
Input Value
the x\begin{align*}x\end{align*} value - it is the independent value
Output Value
the y\begin{align*}y\end{align*} value - it is the dependent value
Positive Correlation
a scatterplot where the points plotted go up from left to right.
Negative Correlation
a scatterplot where the points plotted go down from left to right.
No Correlation
a scatterplot where there isn’t a clear relationship between the dependent and independent values.

Guided Practice

Here is one for you to try on your own.

What kind of relationship is shown by the data?

Solution

As one variable increases, the other variable increases as well.

This scatterplot shows a positive correlation in the data.

Practice

Directions:What type of relationship is shown in the following scatterplots?

1. If the data decreases as one variable increases, what type of relationship is shown?
2. Use the following table to make a scatter plot.

x 3 6 8 141823293237y555046403718262018

1. Draw a trend line.
2. Identify the type of relationship.

A zoologist studied the relationship between the kilometers from a lake and number of felines per 100 square kilometers. She found the following data:

Distance from Lake3 1 434.55.522.53.5865# of Felines 51028 6  5 88 6  6 024

1. Make a scatterplot that illustrates this data.
2. Draw a trend line.
3. What is the correlation?
4. Estimate the number of felines 1.5 kilometers from a lake.

Directions: Define the following terms.

1. Input value
2. Output value
3. Positive correlation
4. Negative correlation
5. No correlation
6. Data set

Vocabulary Language: English

bivariate

bivariate

Bivariate data has two variables
correlation

correlation

Correlation is a statistical method used to determine if there is a connection or a relationship between two sets of data.
curvilinear relationships

curvilinear relationships

Non-linear relationships are called curvilinear relationships.
direct relationship

direct relationship

If the line on a line graph rises to the right, it indicates a direct relationship.
homogeneity

homogeneity

When a group is homogeneous, or possesses similar characteristics, the range of scores on either or both of the variables is restricted.
indirect relationship

indirect relationship

If the line on a line graph falls to the right, it indicates an indirect relationship.
linear relationship

linear relationship

A linear relationship appears as a straight line either rising or falling as the independent variable values increase.
negative correlation

negative correlation

A negative correlation appears as a recognizable line with a negative slope .
non-linear relationship

non-linear relationship

A non-linear relationship may take the form of any number of curved lines but is not a straight line.
positive correlation

positive correlation

A positive correlation appears as a recognizable line with a positive slope               .
scatter plot

scatter plot

A scatter plot is a plot of the dependent variable versus the independent variable and is used to investigate whether or not there is a relationship or connection between 2 sets of data.
Slope

Slope

Slope is a measure of the steepness of a line. A line can have positive, negative, zero (horizontal), or undefined (vertical) slope. The slope of a line can be found by calculating “rise over run” or “the change in the $y$ over the change in the $x$.” The symbol for slope is $m$
strong correlation

strong correlation

Two variables with a strong correlation will appear as a number of points occurring in a clear and recognizable linear pattern.
trends

trends

Trends in data sets or samples are indicators found by reviewing the data from a general or overall standpoint
weak correlation

weak correlation

Two variables with a weak correlation will appear as a much more scattered field of points, with only a little indication of points falling into a line of any sort.