<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# 5.4: Predicting with Linear Models

Difficulty Level: At Grade Created by: CK-12

## Learning Objectives

• Interpolate using an equation.
• Extrapolate using an equation.
• Predict using an equation.

## Introduction

Katja’s sales figures were trending downward quickly at first, and she used a line of best fit to describe the numbers. But now they seem to be decreasing more slowly, and fitting the line less and less accurately. How can she make a more accurate prediction of what next week’s sales will be?

In the last lesson we saw how to find the equation of a line of best fit and how to use this equation to make predictions. The line of “best fit” is a good method if the relationship between the dependent and the independent variables is linear. In this section you will learn other methods that are useful even when the relationship isn’t linear.

## Linear Interpolation

We use linear interpolation to fill in gaps in our data—that is, to estimate values that fall in between the values we already know. To do this, we use a straight line to connect the known data points on either side of the unknown point, and use the equation of that line to estimate the value we are looking for.

Example 1

The following table shows the median ages of first marriage for men and women, as gathered by the U.S. Census Bureau.

Year Median age of males Median age of females
1890 26.1 22.0
1900 25.9 21.9
1910 25.1 21.6
1920 24.6 21.2
1930 24.3 21.3
1940 24.3 21.5
1950 22.8 20.3
1960 22.8 20.3
1970 23.2 20.8
1980 24.7 22.0
1990 26.1 23.9
2000 26.8 25.1

Estimate the median age for the first marriage of a male in the year 1946.

Solution

We connect the two points on either side of 1946 with a straight line and find its equation. Here’s how that looks on a scatter plot:

We find the equation by plugging in the two data points:

\begin{align*}m &= \frac{22.8-24.3}{1950-1940}=\frac{-1.5}{10}=-0.15\\ y &= -0.15x+b\\ 24.3 &= -0.15(1940)+b\\ b &= 315.3\end{align*}

Our equation is \begin{align*}y=-0.15x+315.3\end{align*}.

To estimate the median age of marriage of males in the year 1946, we plug \begin{align*}x = 1946\end{align*} into the equation we just found:

\begin{align*}y=-0.15(1946)+315.3=23.4\end{align*} years old

Example 2

The Center for Disease Control collects information about the health of the American people and behaviors that might lead to bad health. The following table shows the percent of women who smoke during pregnancy.

Year Percent of pregnant women smokers
1990 18.4
1991 17.7
1992 16.9
1993 15.8
1994 14.6
1995 13.9
1996 13.6
2000 12.2
2002 11.4
2003 10.4
2004 10.2

Estimate the percentage of pregnant women that were smoking in the year 1998.

Solution

We connect the two points on either side of 1998 with a straight line and find its equation. Here’s how that looks on a scatter plot:

We find the equation by plugging in the two data points:

\begin{align*}m &= \frac{12.2-13.6}{2000-1996}=\frac{-1.4}{4}=-0.35\\ y &= -0.35x+b\\ 12.2 &= -0.35(2000)+b\\ b &= 712.2\end{align*}

Our equation is \begin{align*}y=-0.35x+712.2\end{align*}.

To estimate the percentage of pregnant women who smoked in the year 1998, we plug \begin{align*}x = 1998\end{align*} into the equation we just found:

\begin{align*}y=-0.35(1998)+712.2=12.9\%\end{align*}

For non-linear data, linear interpolation is often not accurate enough for our purposes. If the points in the data set change by a large amount in the interval you’re interested in, then linear interpolation may not give a good estimate. In that case, it can be replaced by polynomial interpolation, which uses a curve instead of a straight line to estimate values between points. But that’s beyond the scope of this lesson.

## Linear Extrapolation

Linear extrapolation can help us estimate values that are outside the range of our data set. The strategy is similar to linear interpolation: we pick the two data points that are closest to the one we’re looking for, find the equation of the line between them, and use that equation to estimate the coordinates of the missing point.

Example 3

The winning times for the women’s 100 meter race are given in the following table. Estimate the winning time in the year 2010. Is this a good estimate?

Winner Country Year Time (seconds)
Mary Lines UK 1922 12.8
Leni Schmidt Germany 1925 12.4
Gerturd Glasitsch Germany 1927 12.1
Tollien Schuurman Netherlands 1930 12.0
Helen Stephens USA 1935 11.8
Lulu Mae Hymes USA 1939 11.5
Fanny Blankers-Koen Netherlands 1943 11.5
Marjorie Jackson Australia 1952 11.4
Vera Krepkina Soviet Union 1958 11.3
Wyomia Tyus USA 1964 11.2
Barbara Ferrell USA 1968 11.1
Ellen Strophal East Germany 1972 11.0
Inge Helten West Germany 1976 11.0
Marlies Gohr East Germany 1982 10.9
Florence Griffith Joyner USA 1988 10.5

Solution

We start by making a scatter plot of the data; then we connect the last two points on the graph and find the equation of the line.

\begin{align*}m &= \frac{10.5-10.9}{1988-1982}=\frac{-0.4}{6}=-0.067\\ y &= -0.067x+b\\ 10.5 &= -0.067(1988)+b\\ b &= 143.7\end{align*}

Our equation is \begin{align*}y=-0.067x+143.7\end{align*}.

The winning time in year 2010 is estimated to be:

\begin{align*}y=-0.067(2010)+143.7= 9.03\end{align*} seconds.

Unfortunately, this estimate actually isn’t very accurate. This example demonstrates the weakness of linear extrapolation; it uses only a couple of points, instead of using all the points like the best fit line method, so it doesn’t give as accurate results when the data points follow a linear pattern. In this particular example, the last data point clearly doesn’t fit in with the general trend of the data, so the slope of the extrapolation line is much steeper than it would be if we’d used a line of best fit. (As a historical note, the last data point corresponds to the winning time for Florence Griffith Joyner in 1988. After her race she was accused of using performance-enhancing drugs, but this fact was never proven. In addition, there was a question about the accuracy of the timing: some officials said that tail-wind was not accounted for in this race, even though all the other races of the day were affected by a strong wind.)

Here’s an example of a problem where linear extrapolation does work better than the line of best fit method.

Example 4

A cylinder is filled with water to a height of 73 centimeters. The water is drained through a hole in the bottom of the cylinder and measurements are taken at 2 second intervals. The following table shows the height of the water level in the cylinder at different times.

Time (seconds) Water level (cm)
0.0 73
2.0 63.9
4.0 55.5
6.0 47.2
8.0 40.0
10.0 33.4
12.0 27.4
14.0 21.9
16.0 17.1
18.0 12.9
20.0 9.4
22.0 6.3
24.0 3.9
26.0 2.0
28.0 0.7
30.0 0.1

a) Find the water level at time 15 seconds.

b) Find the water level at time 27 seconds

c) What would be the original height of the water in the cylinder if the water takes 5 extra seconds to drain? (Find the height at time of –5 seconds.)

Solution

Here’s what the line of best fit would look like for this data set:

Notice that the data points don’t really make a line, and so the line of best fit still isn’t a terribly good fit. Just a glance tells us that we’d estimate the water level at 15 seconds to be about 27 cm, which is more than the water level at 14 seconds. That’s clearly not possible! Similarly, at 27 seconds we’d estimate the water to have all drained out, which it clearly hasn’t yet.

So let’s see what happens if we use linear extrapolation and interpolation instead. First, here are the lines we’d use to interpolate between 14 and 16 seconds, and between 26 and 28 seconds.

a) The slope of the line between points (14, 21.9) and (16, 17.1) is \begin{align*}m=\frac{17.1-21.9}{16-14}=\frac{-4.8}{2}=-2.4\end{align*}. So \begin{align*}y=-2.4x+b \Rightarrow 21.9=-2.4(14)+b \Rightarrow b=55.5\end{align*}, and the equation is \begin{align*}y=-2.4x+55.5\end{align*}.

Plugging in \begin{align*}x = 15\end{align*} gives us \begin{align*}y=-2.4(15)+55.5= 19.5 \ cm\end{align*}.

b) The slope of the line between points (26, 2) and (28, 0.7) is \begin{align*}m=\frac{0.7-2}{28-26}=\frac{-1.3}{2}=-.65\end{align*}, so \begin{align*}y=-.65x+b \Rightarrow 2=-.65(26)+b \Rightarrow b=18.9\end{align*}, and the equation is \begin{align*}y=-.65x+18.9\end{align*}.

Plugging in \begin{align*}x = 27\end{align*}, we get \begin{align*}y=-.65(27)+18.9= 1.35 \ cm\end{align*}.

c) Finally, we can use extrapolation to estimate the height of the water at -5 seconds. The slope of the line between points (0, 73) and (2, 63.9) is \begin{align*}m=\frac{63.9-73}{2-0}=\frac{-9.1}{2}=-4.55\end{align*}, so the equation of the line is \begin{align*}y=-4.55x+73\end{align*}.

Plugging in \begin{align*}x = -5\end{align*} gives us \begin{align*}y=-4.55(-5)+73 = 95.75 \ cm\end{align*}.

To make linear interpolation easier in the future, you might want to use the calculator at http://www.ajdesigner.com/phpinterpolation/linear_interpolation_equation.php. Plug in the coordinates of the first known data point in the blanks labeled \begin{align*}x_1\end{align*} and \begin{align*}y_1\end{align*}, and the coordinates of the second point in the blanks labeled \begin{align*}x_3\end{align*} and \begin{align*}y_3\end{align*}; then enter the \begin{align*}x-\end{align*}coordinate of the point in between in the blank labeled \begin{align*}x_2\end{align*}, and the \begin{align*}y-\end{align*}coordinate will be displayed below when you click “Calculate.”

## Review Questions

1. Use the data from Example 1 (Median age at first marriage) to estimate the age at marriage for females in 1946. Fit a line, by hand, to the data before 1970.
2. Use the data from Example 1 (Median age at first marriage) to estimate the age at marriage for females in 1984. Fit a line, by hand, to the data from 1970 on in order to estimate this accurately.
3. Use the data from Example 1 (Median age at first marriage) to estimate the age at marriage for males in 1995. Use linear interpolation between the 1990 and 2000 data points.
4. Use the data from Example 2 (Pregnant women and smoking) to estimate the percentage of pregnant smokers in 1997. Use linear interpolation between the 1996 and 2000 data points.
5. Use the data from Example 2 (Pregnant women and smoking) to estimate the percentage of pregnant smokers in 2006. Use linear extrapolation with the final two data points.
6. Use the data from Example 3 (Winning times) to estimate the winning time for the female 100-meter race in 1920. Use linear extrapolation because the first two or three data points have a different slope than the rest of the data.
7. The table below shows the highest temperature vs. the hours of daylight for the \begin{align*}15^{th}\end{align*} day of each month in the year 2006 in San Diego, California.
Hours of daylight High temperature (F)
10.25 60
11.0 62
12 62
13 66
13.8 68
14.3 73
14 86
13.4 75
12.4 71
11.4 66
10.5 73
10 61

(a) What would be a better way to organize this table if you want to make the relationship between daylight hours and temperature easier to see?

(b) Estimate the high temperature for a day with 13.2 hours of daylight using linear interpolation.

(c) Estimate the high temperature for a day with 9 hours of daylight using linear extrapolation. Is the prediction accurate?

(d) Estimate the high temperature for a day with 9 hours of daylight using a line of best fit.

The table below lists expected life expectancies based on year of birth (US Census Bureau). Use it to answer questions 8-15.

Birth year Life expectancy in years
1930 59.7
1940 62.9
1950 68.2
1960 69.7
1970 70.8
1980 73.7
1990 75.4
2000 77
1. Make a scatter plot of the data.
2. Use a line of best fit to estimate the life expectancy of a person born in 1955.
3. Use linear interpolation to estimate the life expectancy of a person born in 1955.
4. Use a line of best fit to estimate the life expectancy of a person born in 1976.
5. Use linear interpolation to estimate the life expectancy of a person born in 1976.
6. Use a line of best fit to estimate the life expectancy of a person born in 2012.
7. Use linear extrapolation to estimate the life expectancy of a person born in 2012.
8. Which method gives better estimates for this data set? Why?

The table below lists the high temperature for the fist day of the month for the year 2006 in San Diego, California (Weather Underground). Use it to answer questions 16-21.

Month number Temperature (F)
1 63
2 66
3 61
4 64
5 71
6 78
7 88
8 78
9 81
10 75
11 68
12 69
1. Draw a scatter plot of the data.
2. Use a line of best fit to estimate the temperature in the middle of the \begin{align*}4^{th}\end{align*} month (month 4.5).
3. Use linear interpolation to estimate the temperature in the middle of the \begin{align*}4^{th}\end{align*} month (month 4.5).
4. Use a line of best fit to estimate the temperature for month 13 (January 2007).
5. Use linear extrapolation to estimate the temperature for month 13 (January 2007).
6. Which method gives better estimates for this data set? Why?
7. Name a real-world situation where you might want to make predictions based on available data. Would linear extrapolation/interpolation or the best fit method be better to use in that situation? Why?

## Texas Instruments Resources

In the CK-12 Texas Instruments Algebra I FlexBook, there are graphing calculator activities designed to supplement the objectives for some of the lessons in this chapter. See http://www.ck12.org/flexr/chapter/9615.

### Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes

Show Hide Details
Description
Tags:
Subjects: