<meta http-equiv="refresh" content="1; url=/nojavascript/"> Regression and Correlation | CK-12 Foundation
You are reading an older version of this FlexBook® textbook: CK-12 Probability and Statistics - Advanced (Teachers Edition) Go to the latest version.

# 2.9: Regression and Correlation

Created by: CK-12

## Scatter Plots and Linear Correlation

Project: Finding Sets of Bivariate Data with Different Types of Relationships

In this project students will explore bivariate data. They will consider possible relationships, visual representations of the data, and gain experience with correlation coefficients including their strengths and weaknesses as describers of the relationship. The students will also practice using Excel, an extremely powerful and useful tool for analyzing data.

Objective: To find four sets of bivariate data with different relationships, graph each set of data, and calculate each correlation coefficient.

Procedure:

1. Find or collect four sets of bivariate data each with at least fifteen pairs. Find one set with each type of relationship listed below. Cite the source of your data or describe the collection method.

• a positive linear correlation
• a negative linear correlation
• no correlation (or very close to none)
• a curviliniear relationship

2. Use Excel to make a scatter plot for each set of data. Label the axes and title each plot.

3. Use Excel to calculate the correlation coefficient for each set of data.

Analysis: Include the following analysis with the work you did on Excel in a written report and/or PowerPoint presentation.

1. Describe the strength of each relationship. Is it what you expected it would be?
2. For which sets of data is the correlation coefficient an accurate measure of the strength? Discuss why it would, or would not be an accurate descriptor in each case.
3. Do you believe there is a causal relationship between the two variables in each set of data? Why or why not?

## Least-Squares Regression

Project: Calculating and Analyzing the Least-Squares Regression Line (continued from project for the previous section)

Objective: To find, analyze, and use the regression line for the data sets found in the previous project.

Procedure:

1. Use Excel to calculate the slope and $y-$intercept of the least-squares regression line for the set of data with positive correlation in the project for the previous lesson.

2. Graph the least-squares regression line over the scatter plot made in the previous project.

3. Use Excel to calculate the residual for each point. Find the sum of the residuals. Is it what you expected it to be?

4. Make a residual scatter plot on Excel and use to identify outliers.

5. Decide if you would like to eliminate any outliers from your set, and recalculate the equation of the least-squares regression line if necessary.

6. Use the least-squares regression line to make three predictions.

• For the first prediction, use a value of the predictor variable that is inside the range of data you collected, but for which you have no value. This is called interpolation.
• For the second predication, use a value of the predictor variable that is above the range of data you collected. This is called extrapolation.
• For the third prediction, use a value of the predictor variable that is below the range of data you collected. This is also called extrapolation.

Do you think that interpolation or extrapolation is more accurate? Why?

7. Repeat the process for the set of data with negative correlation in the project for the previous lesson.

8. Consider the data with the curvilinear relationship of the previous project. Is it possible to apply a transformation to achieve linearity? Play around with the data and see what you can do.

Project: Hypothesis Testing and Confidence Intervals for the Regression Coefficient (continued from projects for the previous two sections)

Objective: To analyze the reliability of the regression coefficient calculated in the previous project with a hypothesis test, and to make a confidence interval for the regression coefficient.

Procedure:

1. Use the positively correlated data from the previous project and conduct a hypothesis test on the regression coefficient at the $0.05$ significance level. Follow the steps below.

• State the null and alternative hypothesis.
• Calculate the test statistic using Excel. Recall that the standard error of estimate is calculated as follows: $s_{y*x} = \sqrt{\frac{\sum (y - \hat{y})^2}{n - 2}}$, where $\hat y$ is the value of $y$ predicted by the least-squares regression line for each value of $x$, and $y$ is the actual value in the data.
• Find the critical value using the $t$ distribution and $n-2\;\mathrm{degrees}$ of freedom.
• State the conclusion of the test and interpret the results in the context of the data.

2. Repeat the process using the negatively correlated data, and then again using the data with little to no correlation.

3. Construct a $95 \%$ confidence interval for the regression coefficient of the least-squares regression line for the positively correlated data by using the following formula.

$b \pm tS_b$, where $b$ is the regression coefficient calculated in the previous project, and $t$ is obtained from the $t$ distribution table for $\frac{\alpha}{2}$ area in the right tail of the $t$ distribution and $n - 2\;\mathrm{degrees}$ of freedom

4. Repeat using the negatively correlated data, and then again using the data with little to no correlation.

5. Analyze the results. Are these the outcomes you expected? Do they make sense? Why or why not?

## Multiple Regression

Project: Calculating and Analyzing the Multiple Regression Equation for Student Collected Data Using Excel

Objective: To use Excel to calculate the multiple regression equation for data you have collected and to analyze the contribution of each variable to the relationship.

Procedure:

1. Think of a relationship for which you can gather data where one variable is determined by at least four predictor variables. Use a sample with at least fifteen ordered pairs. Cite your source or describe your collection method.

2. Enter your data into Excel and use the Data Analysis tools to calculate the regression statistics.

Calculate the Multiple Regression Equation

3. Write the regression model and interpret the regression coefficients.

4. Use the test statistic for the for each predictor variable to decide if it should be used in the regression equation. Eliminate variables that do not significantly contribute to the variance of the outcome variable, and recalculate the equation if necessary.

Hypothesis Testing

5. State the null and alternative hypothesis for the $R$ value.

6. What is the $F-$statistic and the associated probability for your data?

7. State and interpret the results of your test.

Confidence Interval

8. Find the $95 \%$ confidence interval for each variable still in the regression equation.

Predict

9. Use your final regression equation to make some relevant predictions.

Feb 23, 2012

Aug 19, 2014