## Linear regression models used for testing null hypothesis, predicting probability and for building confidence intervals

Estimated9 minsto complete
%
Progress

MEMORY METER
This indicates how strong in your memory this concept is
Progress
Estimated9 minsto complete
%
Regression and Correlation

Correlation

Correlation measures the relationship between bivariate data. Scatterplots display these bivariate data sets and provide a visual representation of the relationship between variables.

Examining a scatterplot graph allows us to obtain some idea about the relationship between two variables.

• lower-left-to-upper-right pattern --> positive correlation
• upper-left-to-lower-right pattern --> negative correlation
• straight line --> perfect correlation
• no linear trend --> zero correlation or a near-zero correlation

Correlation Coefficients

While examining scatterplots gives us some idea about the relationship between two variables, we use a statistic called the correlation coefficient to give us a more precise measurement of the relationship between the two variables. The correlation coefficient is an index that describes the relationship and can take on values between 1.0\begin{align*}-1.0\end{align*} and +1.0, with a positive correlation coefficient indicating a positive correlation and a negative correlation coefficient indicating a negative correlation.

The absolute value of the coefficient indicates the magnitude, or the strength, of the relationship. The closer the absolute value of the coefficient is to 1, the stronger the relationship. For example, a correlation coefficient of 0.20 indicates that there is a weak linear relationship between the variables, while a coefficient of 0.90\begin{align*}-0.90\end{align*} indicates that there is a strong linear relationship.

The value of a perfect positive correlation is 1.0, while the value of a perfect negative correlation is -1.0 .

When there is no linear relationship between two variables, the correlation coefficient is 0. Note: It is important to remember that a correlation coefficient of 0 indicates that there is no linear relationship, but there may still be a strong relationship between the two variables. For example, there could be a quadratic relationship between them.

Calculating the Regression Line

Linear regression involves using data to calculate a line that best fits that data and then using that line to predict scores. In linear regression, we use one variable (the predictor variable) to predict the outcome of another (the outcome variable, or criterion variable).

Least squares regression is a method of fitting the data line so that there is minimal difference between the observations and the line.  In the example below, you can see the calculated distances, or residual values, from each of the observations to the regression line.

As you can see, the regression line is a straight line that expresses the relationship between two variables. When predicting one score by using another, we use an equation such as the following, which is equivalent to the slope-intercept form of the equation for a straight line:

Y=bX+a\begin{align*}Y = bX + a\end{align*}

where:

Y\begin{align*}Y\end{align*} is the score that we are trying to predict.

b\begin{align*}b\end{align*} is the slope of the line.

a\begin{align*}a\end{align*} is the y\begin{align*}y\end{align*} -intercept, or the value of Y\begin{align*}Y\end{align*} when the value of X\begin{align*}X\end{align*} is 0.

To calculate the line itself, we need to find the values for b\begin{align*}b\end{align*} (the regression coefficient ) and a\begin{align*}a\end{align*} (the regression constant ).

We use the following formula to calculate the regression coefficient:

borb=nxyxynx2(x)2=(r)sYsX\begin{align*}b & = \frac{n\sum xy-\sum x \sum y}{n \sum x^2-\left ( \sum x \right )^2}\\ \text{or}\\ b & = (r) \frac{s_Y}{s_X}\end{align*}

where:

r\begin{align*}r\end{align*} is the correlation between the variables X\begin{align*}X\end{align*} and Y\begin{align*}Y\end{align*} .

sY\begin{align*}s_Y\end{align*} is the standard deviation of the Y\begin{align*}Y\end{align*} scores.

sX\begin{align*}s_X\end{align*} is the standard deviation of the X\begin{align*}X\end{align*} scores.

We use the following formula to calculate the regression constant:

a=ybxn=y¯bx¯\begin{align*}a = \frac{\sum y - b \sum x}{n} = \bar{y}-b\bar{x}\end{align*}

Hypothesis Testing for Linear Relationships

In hypothesis testing of linear regression models, the null hypothesis to be tested is that the regression coefficient, β\begin{align*}\beta\end{align*}, equals zero. Our alternative hypothesis is that our regression coefficient does not equal zero.

H0: βHa: β=00\begin{align*}H_0 : \ \beta & = 0\\ H_a : \ \beta & \neq 0\end{align*}

The test statistic for this hypothesis test is calculated as follows:

twheresbsSSE=bβsb=s(xx¯)2=sSSX,=SSEn2, and=sum of residual error squared\begin{align*}t &= \frac{b-\beta}{s_b}\\ \text{where} \qquad s_b &= \frac{s}{\sqrt{\sum (x-\bar{x})^2}} = \frac{s}{\sqrt{SS_X}},\\ s &= \sqrt{\frac{SSE}{n-2}}, \text{ and}\\ SSE &= \text{sum of residual error squared}\end{align*}

Multiple Linear Regression
In multiple linear regression, scores for one variable are predicted using multiple predictor variables.

When predicting values using multiple regression, we first use the standard score form of the regression equation, which is shown below:

Y^=β1X1+β2X2++βiXi\begin{align*}\hat{Y} = \beta_1X_1 + \beta_2X_2 + \ldots + \beta_iX_i\end{align*}

where:

Y^\begin{align*}\hat{Y}\end{align*} is the predicted variable, or criterion variable.

βi\begin{align*}\beta_i\end{align*} is the ith\begin{align*}i^{\text{th}}\end{align*} regression coefficient.

Xi\begin{align*}X_i\end{align*} is the ith\begin{align*}i^{\text{th}}\end{align*} predictor variable.

### Explore More

Sign in to explore more, including practice questions and solutions for Linear Regression Equations.