Skip Navigation

5.6: Predicting with Linear Models

Difficulty Level: At Grade Created by: CK-12
Turn In

Numerical information appears in all areas of life. You can find it in newspapers, in magazines, in journals, on the television, or on the Internet. In the last lesson, you saw how to find the equation of a line of best fit. Using a line of best fit is a good method if the relationship between the dependent and independent variables is linear. Not all data fits a straight line, though. This lesson will show other methods to help estimate data values. These methods are useful in both linear and non-linear relationships.

Linear Interpolation

Linear interpolation is useful when looking for a value between given data points. It can be considered as “filling in the gaps” of a table of data.

The strategy for linear interpolation is to use a straight line to connect the known data points on either side of the unknown point. Linear interpolation is often not accurate for non-linear data. If the points in the data set change by a large amount, linear interpolation may not give a good estimate.

Linear Extrapolation

Linear extrapolation can help us estimate values that are either higher or lower than the values in the data set. Think of this as “the long-term estimate” of the data.

The strategy for linear extrapolation is to use a subset of the data instead of the entire data set. This is especially true for non-linear data you will encounter in later chapters. For this type of data, it is sometimes useful to extrapolate using the last two or three data points in order to estimate a value higher than the data range.

Collecting and Organizing Data

Data can be collected through various means, including surveys or experiments.

A survey is a data collection method used to gather information about individuals’ opinions, beliefs, or habits.

The information collected by the U.S. Census Bureau or the Center for Disease Control are examples of data gathered using surveys. The U.S. Census Bureau collects information about many aspects of the U.S. population.

An experiment is a controlled test or investigation.

Let’s say we are interested in how the median age for first marriages has changed during the \begin{align*}20^{th}\end{align*} century. The U.S. Census provides the following information about the median age at first marriage for males and females. Below is the table of data and its corresponding scatter plot.

Year Median Age of Males Median Age of Females
1890 26.1 22.0
1900 25.9 21.9
1910 25.1 21.6
1920 24.6 21.2
1930 24.3 21.3
1940 24.3 21.5
1950 22.8 20.3
1960 22.8 20.3
1970 23.2 20.8
1980 24.7 22.0
1990 26.1 23.9
2000 26.8 25.1

Median Age of Males and Females at First Marriage by Year

Example: Estimate the median age for the first marriage of a male in the year 1946.

Solution: We will first use the method of interpolation because there is a “gap” needing to be filled. 1946 is between 1940 and 1950, so these are the data points we will use.

By connecting the two points, an equation can be found.

\begin{align*}\text{Slope} && m & = \frac{22.8 - 24.3}{1950 - 1940} = \frac{-1.5}{10}=-0.15\\ && y& =-0.15x+b\\ && 24.3 & = -0.15(1940)+b\\ && b & = 315.3\\ \text{Equation} && y&=-0.15x+315.3\end{align*}

To estimate the median age of marriage of males in year 1946, substitute \begin{align*}x=1946\end{align*} in the equation.

\begin{align*}y=-0.15(1946)+315.3=23.4 \ years \ old\end{align*}

Example: The Center for Disease Control (CDC) has the following information regarding the percentage of pregnant women smokers organized by year. Estimate the percentage of pregnant women that were smoking in the year 1998.

Percent of Pregnant Women Smokers by Year
Year Percent
1990 18.4
1991 17.7
1992 16.9
1993 15.8
1994 14.6
1995 13.9
1996 13.6
2000 12.2
2002 11.4
2003 10.4
2004 10.2

Percent of Pregnant Women Smokers by Year

Solution: We want to use the information close to 1998 to interpolate the data. We do this by connecting the points on either side of 1998 with a straight line and find the equation of that line.

\begin{align*}\text{Slope} && m&=\frac{12.2-13.6}{2000-1996}=\frac{-1.4}{4}=-0.35\\ && y& =-0.35x + b\\ && 12.2 & = -0.35(2000)+b\\ && b& =712.2\\ \text{Equation} && y& =-0.35x+712.2 \end{align*}

To estimate the percentage of pregnant women who smoked in year 1998, substitute \begin{align*}x=1998\end{align*} into the equation.


Predicting Using an Equation

When linear interpolation and linear extrapolation do not produce accurate predictions, using the line of best fit (linear regression) may be the best choice. The “by hand” and calculator methods of determining the line of best fit were presented in the last lesson.

Example: The winning times for the women’s 100-meter race are given in the following table. Estimate the winning time in the year 2010. Is this a good estimate?

Winner Ctry. Year Seconds Winner Ctry. Year Seconds
Mary Lines UK 1922 12.8 Vera Krepkina Sov. 1958 11.3
Leni Schmidt Germ. 1925 12.4 Wyomia Tyus USA 1964 11.2
Gertrurd Glasitsch Germ. 1927 12.1 Barbara Ferrell USA 1968 11.1
Tollien Schuurman Neth. 1930 12.0 Ellen Strophal E. Germ. 1972 11.0
Helen Stephens USA 1935 11.8 Inge Helten W. Germ. 1975 11.0
Lulu Mae Hymes USA 1939 11.5 Marlies Gohr E. Germ. 1982 10.9
Fanny Blankers-Koen Neth. 1943 11.5 Florence Griffith Joyner USA 1988 10.5
Marjorie Jackson Austr. 1952 11.4

Solution: Start by making a scatter plot of the data. Connect the last two points on the graph and find the equation of the line.

Winning Times for the Women’s 100-meter Race by Year

\begin{align*}^3 \text{Source}\end{align*}: http://en.wikipedia.org/wiki/World_Record_progression_100_m_women.

\begin{align*}\text{Slope} && m & =\frac{10.5 - 10.9}{1988-1982}=\frac{-0.4}{6}=-0.067\\ && y& =-0.067x+b\\ && 10.5& =-0.067(1988)+b\\ && b& =143.7\\ \text{Equation} && y & =-0.067x+143.7\end{align*}

The winning time in year 2010 is estimated to be: \begin{align*}y=-0.067(2010)+143.7=\underline{9.03 \ \text{seconds}}\end{align*}.

How accurate is this estimate? It is likely that it's not very accurate because 2010 is a long time from 1988. This example demonstrates the weakness of linear extrapolation. Estimates given by linear extrapolation are never as good as using the equation from the line of best fit method. In this particular example, the last data point clearly does not fit in with the general trend of the data so the slope of the extrapolation line is much steeper than it should be.

As a historical note, the last data point corresponds to the winning time for Florence Griffith Joyner in 1988. After her race, she was accused of using performance-enhancing drugs but this fact was never proven. In addition, there is a question about the accuracy of the timing because some officials said that the tail wind was not accounted for in this race even though all the other races of the day were impacted by a strong wind.

Practice Set

Sample explanations for some of the practice exercises below are available by viewing the following video. Note that there is not always a match between the number of the practice exercise in the video and the number of the practice exercise listed in the following exercise set.  However, the practice exercise is the same in both. CK-12 Basic Algebra: Predicting with Linear Models (11:46)

  1. What does it mean to interpolate the data? In which cases would this method be useful?
  2. How is interpolation different from extrapolation? In which cases would extrapolation be more beneficial?
  3. What was the problem with using the interpolation method to come up with an equation for the women’s Olympic winning times?
  4. Use the Winning Times data and determine an equation for the line of best fit.
  5. Use the Median Age at First Marriage data to estimate the age at marriage for females in 1946. Fit a line, by hand, to the data before 1970.
  6. Use the Median Age at First Marriage data to estimate the age at marriage for females in 1984. Fit a line, by hand, to the data from 1970 on in order to estimate this accurately.
  7. Use the Median Age at First Marriage data to estimate the age at marriage for males in 1995. Use linear interpolation between the 1990 and 2000 data points.
  8. Use the data from Pregnant Women and Smoking to estimate the percent of pregnant smokers in 1997. Use linear interpolation between the 1996 and 2000 data points.
  9. Use the data from Pregnant Women and Smoking to estimate the percent of pregnant smokers in 2006. Use linear extrapolation with the final two data points.
  10. Use the Winning Times data to estimate the winning time for the female 100-meter race in 1920. Use linear extrapolation because the first two or three data points have a different slope than the rest of the data.
  11. The table below shows the highest temperature vs. the hours of daylight for the \begin{align*}15^{th}\end{align*} day of each month in the year 2006 in San Diego, California. Using linear interpolation, estimate the high temperature for a day with 13.2 hours of daylight.
Hours of daylight High temperature \begin{align*}(F)\end{align*}
10.25 60
11.0 62
12 62
13 66
13.8 68
14.3 73
14 86
13.4 75
12.4 71
11.4 66
10.5 73
10 61
  1. Use the table above to estimate the high temperature for a day with 9 hours of daylight using linear extrapolation. Is the prediction accurate? Find the answer using line of best fit.

Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes
Show More

Image Attributions

Show Hide Details
Save or share your relevant files like activites, homework and worksheet.
To add resources, you must be the owner of the section. Click Customize to make your own copy.
Please wait...
Please wait...
Image Detail
Sizes: Medium | Original