7.1: Line Graphs and Scatter Plots
Learning Objectives
- Represent data that has a linear pattern on a graph.
- Represent data using a broken-line graph.
- Understand the difference between continuous data and discrete data as it applies to a line graph.
- Represent data that has no definite pattern as a scatter plot.
- Draw a line of best fit on a scatter plot.
- Use technology to create both line graphs and scatter plots.
Before you continue to explore the concept of representing data graphically, it is very important to understand the meaning of some basic terms that will often be used in this lesson. The first such definition is that of a variable. In statistics, a variable is simply a characteristic that is being studied. This characteristic assumes different values for different elements, or members, of the population, whether it is the entire population or a sample. The value of the variable is referred to as an observation, or a measurement. A collection of these observations of the variable is a data set.
Variables can be quantitative or qualitative. A quantitative variable is one that can be measured numerically. Some examples of a quantitative variable are wages, prices, weights, numbers of vehicles, and numbers of goals. All of these examples can be expressed numerically. A quantitative variable can be classified as discrete or continuous. A discrete variable is one whose values are all countable and does not include any values between 2 consecutive values of a data set. An example of a discrete variable is the number of goals scored by a team during a hockey game. A continuous variable is one that can assume any countable value, as well as all the values between 2 consecutive numbers of a data set. An example of a continuous variable is the number of gallons of gasoline used during a trip to the beach.
A qualitative variable is one that cannot be measured numerically but can be placed in a category. Some examples of a qualitative variable are months of the year, hair color, color of cars, a person’s status, and favorite vacation spots. The following flow chart should help you to better understand the above terms.
Example 1
Select the best descriptions for the following variables and indicate your selections by marking an ‘\begin{align*}x\end{align*}’ in the appropriate boxes.
Variable | Quantitative | Qualitative | Discrete | Continuous |
---|---|---|---|---|
Number of members in a family | ||||
A person’s marital status | ||||
Length of a person’s arm | ||||
Color of cars | ||||
Number of errors on a math test |
Solution:
Variable | Quantitative | Qualitative | Discrete | Continuous |
---|---|---|---|---|
Number of members in a family | \begin{align*}x\end{align*} | \begin{align*}x\end{align*} | ||
A person’s marital status | \begin{align*}x\end{align*} | |||
Length of a person’s arm | \begin{align*}x\end{align*} | \begin{align*}x\end{align*} | ||
Color of cars | \begin{align*}x\end{align*} | |||
Number of errors on a math test | \begin{align*}x\end{align*} | \begin{align*}x\end{align*} |
Variables can also be classified as dependent or independent. When there is a linear relationship between 2 variables, the values of one variable depend upon the values of the other variable. In a linear relation, the values of \begin{align*}y\end{align*} depend upon the values of \begin{align*}x\end{align*}. Therefore, the dependent variable is represented by the values that are plotted on the \begin{align*}y\end{align*}-axis, and the independent variable is represented by the values that are plotted on the \begin{align*}x\end{align*}-axis.
Example 2
Sally works at the local ballpark stadium selling lemonade. She is paid $15.00 each time she works, plus $0.75 for each glass of lemonade she sells. Create a table of values to represent Sally’s earnings if she sells 8 glasses of lemonade. Use this table of values to represent her earnings on a graph.
Solution:
The first step is to write an equation to represent her earnings and then to use this equation to create a table of values.
\begin{align*}y=0.75x+15\end{align*}, where \begin{align*}y\end{align*} represents her earnings and \begin{align*}x\end{align*} represents the number of glasses of lemonade she sells.
Number of Glasses of Lemonade | Earnings |
---|---|
0 | $15.00 |
1 | $15.75 |
2 | $16.50 |
3 | $17.25 |
4 | $18.00 |
5 | $18.75 |
6 | $19.50 |
7 | $20.25 |
8 | $21.00 |
The dependent variable is the money earned, and the independent variable is the number of glasses of lemonade sold. Therefore, money is on the \begin{align*}y\end{align*}-axis, and the number of glasses of lemonade is on the \begin{align*}x\end{align*}-axis.
From the table of values, Sally will earn $21.00 if she sells 8 glasses of lemonade.
Now that the points have been plotted, the decision has to be made as to whether or not to join them. Between every 2 points plotted on the graph are an infinite number of values. If these values are meaningful to the problem, then the plotted points can be joined. This type of data is called continuous data. If the values between the 2 plotted points are not meaningful to the problem, then the points should not be joined. This type of data is called discrete data. Since glasses of lemonade are represented by whole numbers, and since fractions or decimals are not appropriate values, the points between 2 consecutive values are not meaningful in this problem. Therefore, the points should not be joined. The data is discrete.
Now it is time to revisit the problem presented in the introduction.
The local arena is trying to attract as many participants as possible to attend the community’s “Skate for Scoliosis” event. Participants pay a fee of $10.00 for registering, and, in addition, the arena will donate $3.00 for each hour a participant skates, up to a maximum of 6 hours. Create a table of values and draw a graph to represent a participant who skates for the entire 6 hours. How much money can a participant raise for the community if he/she skates for the maximum length of time?
Solution:
The equation for this scenario is \begin{align*}y=3x+10\end{align*}, where \begin{align*}y\end{align*} represents the money made by the participant, and \begin{align*}x\end{align*} represents the number of hours the participant skates.
Numbers of Hours Skating | Money Earned |
---|---|
0 | $10.00 |
1 | $13.00 |
2 | $16.00 |
3 | $19.00 |
4 | $22.00 |
5 | $25.00 |
6 | $28.00 |
The dependent variable is the money made, and the independent variable is the number of hours the participant skated. Therefore, money is on the \begin{align*}y\end{align*}-axis, and time is on the \begin{align*}x\end{align*}-axis as shown below:
A participant who skates for the entire 6 hours can make $28.00 for the "Skate for Scoliosis" event. The points are joined, because the fractions and decimals between 2 consecutive points are meaningful for this problem. A participant could skate for 30 minutes, and the arena would pay that skater $1.50 for the time skating. The data is continuous.
Linear graphs are important in statistics when several data sets are used to represent information about a single topic. An example would be data sets that represent different plans available for cell phone users. These data sets can be plotted on the same grid. The resulting graph will show intersection points for the plans. These intersection points indicate a coordinate where 2 plans are equal. An observer can easily interpret the graph to decide which plan is best, and when. If the observer is trying to choose a plan to use, the choice can be made easier by seeing a graphical representation of the data.
Example 3
The following graph represents 3 plans that are available to customers interested in hiring a maintenance company to tend to their lawn. Using the graph, explain when it would be best to use each plan for lawn maintenance.
Solution:
From the graph, the base fee that is charged for each plan is obvious. These values are found on the \begin{align*}y\end{align*}-axis. Plan A charges a base fee of $200.00, Plan C charges a base fee of $100.00, and Plan B charges a base fee of $50.00. The cost per hour can be calculated by using the values of the intersection points and the base fee in the equation \begin{align*}y=mx+b\end{align*} and solving for \begin{align*}m\end{align*}. Plan B is the best plan to choose if the lawn maintenance takes less than 12.5 hours. At 12.5 hours, Plan B and Plan C both cost $150.00 for lawn maintenance. After 12.5 hours, Plan C is the best deal, until 50 hours of lawn maintenance is needed. At 50 hours, Plan A and Plan C both cost $300.00 for lawn maintenance. For more than 50 hours of lawn maintenance, Plan A is the best plan. All of the above information was obvious from the graph and would enhance the decision-making process for any interested client.
The above graphs represent linear functions, and are called linear (line) graphs. Each of these graphs has a defined slope that remains constant when the line is plotted. A variation of this graph is a broken-line graph. This type of line graph is used when it is necessary to show change over time. A line is used to join the values, but the line has no defined slope. However, the points are meaningful, and they all represent an important part of the graph. Usually a broken-line graph is given to you, and you must interpret the given information from the graph.
Example 4
The following graph is an example of a broken-line graph, and it represents the time of a round-trip journey, driving from home to a popular campground and back.
a) How far is it from home to the picnic park?
b) How far is it from the picnic park to the campground?
c) At what 2 places did the car stop?
d) How long was the car stopped at the campground?
e) When does the car arrive at the picnic park?
f) How long did it take for the return trip?
g) What was the speed of the car from home to the picnic park?
h) What was the speed of the car from the campground to home?
Solution:
a) It is 40 miles from home to the picnic park.
b) It is 60 miles from the picnic park to the campground.
c) The car stopped at the picnic park and at the campground.
d) The car was stopped at the campground for 15 minutes.
e) The car arrived at the picnic park at 11:00 am.
f) The return trip took 1 hour.
g) The speed of the car from home to the picnic park was 40 mi/h.
h) The speed of the car from the campground to home was 100 mi/h.
Example 5
Sam decides to spend some time with his friend Aaron. He hops on his bike and starts off to Aaron’s house, but on his way, he gets a flat tire and must walk the remaining distance. Once he arrives at Aaron’s house, they repair the flat tire, play some poker, and then Sam returns home. On his way home, Sam decides to stop at the mall to buy a book on how to play poker. The following graph represents Sam’s adventure:
a) How far is it from Sam’s house to Aaron’s house?
b) How far is it from Aaron’s house to the mall?
c) At what time did Sam have a flat tire?
d) How long did Sam stay at Aaron’s house?
e) At what speed did Sam travel from Aaron’s house to the mall and then from the mall to home?
Solution:
a) It is 25 km from Sam’s house to Aaron’s house.
b) It is 15 km from Aaron’s house to the mall.
c) Sam had a flat tire at 10:00 am.
d) Sam stayed at Aaron’s house for 1 hour.
e) Sam traveled at a speed of 30 km/h from Aaron’s house to the mall and then at a speed of 40 km/h from the mall to home.
Often, when real-world data is plotted, the result is a linear pattern. The general direction of the data can be seen, but the data points do not all fall on a line. This type of graph is called a scatter plot. A scatter plot is often used to investigate whether or not there is a relationship or connection between 2 sets of data. The data is plotted on a graph such that one quantity is plotted on the \begin{align*}x\end{align*}-axis and one quantity is plotted on the \begin{align*}y\end{align*}-axis. The quantity that is plotted on the \begin{align*}x\end{align*}-axis is the independent variable, and the quantity that is plotted on the \begin{align*}y\end{align*}-axis is the dependent variable. If a relationship does exist between the 2 sets of data, it will be easy to see if the data is plotted on a scatter plot.
The following scatter plot shows the price of peaches and the number sold:
The connection is obvious\begin{align*}-\end{align*}when the price of peaches was high, the sales were low, but when the price was low, the sales were high.
The following scatter plot shows the sales of a weekly newspaper and the temperature:
There is no connection between the number of newspapers sold and the temperature.
Another term used to describe 2 sets of data that have a connection or a relationship is correlation. The correlation between 2 sets of data can be positive or negative, and it can be strong or weak. The following scatter plots will help to enhance this concept.
If you look at the 2 sketches that represent a positive correlation, you will notice that the points are around a line that slopes upward to the right. When the correlation is negative, the line slopes downward to the right. The 2 sketches that show a strong correlation have points that are bunched together and appear to be close to a line that is in the middle of the points. When the correlation is weak, the points are more scattered and not as concentrated.
In the sales of newspapers and the temperature, there was no connection between the 2 data sets. The following sketches represent some other possible outcomes when there is no correlation between data sets:
Example 6
Plot the following points on a scatter plot, with \begin{align*}m\end{align*} as the independent variable and \begin{align*}n\end{align*} as the dependent variable. Number both axes from 0 to 20. If a correlation exists between the values of \begin{align*}m\end{align*} and \begin{align*}n\end{align*}, describe the correlation (strong negative, weak positive, etc.).
\begin{align*}& m \quad 4 \quad 9 \quad 13 \quad 16 \quad 17 \quad 6 \quad 7 \quad \ 18 \quad 10\\ & n \quad \ 5 \quad 3 \quad 11 \quad 18 \quad 6 \quad 11 \quad 18 \quad 12 \quad 16\end{align*}
Solution:
Example 7
Describe the correlation, if any, in the following scatter plot:
Solution:
In the above scatter plot, there is a strong positive correlation.
You now know that a scatter plot can have either a positive or a negative correlation. When this exists on a scatter plot, a line of best fit can be drawn on the graph. The line of best fit must be drawn so that the sums of the distances to the points on either side of the line are approximately equal and such that there are an equal number of points above and below the line. Using a clear plastic ruler makes it easier to meet all of these conditions when drawing the line. Another useful tool is a stick of spaghetti, since it can be easily rolled and moved on the graph until you are satisfied with its location. The edge of the spaghetti can be traced to produce the line of best fit. A line of best fit can be used to make estimations from the graph, but you must remember that the line of best fit is simply a sketch of where the line should appear on the graph. As a result, any values that you choose from this line are not very accurate\begin{align*}-\end{align*}the values are more of a ballpark figure.
Example 8
The following table consists of the marks achieved by 9 students on chemistry and math tests:
Student | A | B | C | D | E | F | G | H | I |
---|---|---|---|---|---|---|---|---|---|
Chemistry Marks | 49 | 46 | 35 | 58 | 51 | 56 | 54 | 46 | 53 |
Math Marks | 29 | 23 | 10 | 41 | 38 | 36 | 31 | 24 | ? |
Plot the above marks on scatter plot, with the chemistry marks on the \begin{align*}x\end{align*}-axis and the math marks on the \begin{align*}y\end{align*}-axis. Draw a line of best fit, and use this line to estimate the mark that Student I would have made in math had he or she taken the test.
Solution:
If Student I had taken the math test, his or her mark would have been between 32 and 37.
Scatter plots and lines of best fit can also be drawn by using technology. The TI-83 is capable of graphing both a scatter plot and of inserting the line of best fit onto the scatter plot.
Example 9
Using the data from Example 8, create a scatter plot and draw a line of best fit with the TI-83.
Student | A | B | C | D | E | F | G | H | I |
---|---|---|---|---|---|---|---|---|---|
Chemistry Marks | 49 | 46 | 35 | 58 | 51 | 56 | 54 | 46 | 53 |
Math Marks | 29 | 23 | 10 | 41 | 38 | 36 | 31 | 24 | ? |
Solution:
The calculator can now be used to determine a linear regression equation for the given values. The equation can be entered into the calculator, and the line will be plotted on the scatter plot.
From the line of best fit, the calculated value for Student I's math test mark was 33.6. Remember that the mark that you estimated was between 32 and 37.
Lesson Summary
In this lesson, you learned how to represent data by graphing a straight line of the form \begin{align*}y=mx+b\end{align*}, and also by using a scatter plot and a line of best fit. Interpreting a broken-line graph was also presented in this lesson. You learned about correlation as it applies to a scatter plot and how to describe the correlation of a scatter plot. You also learned how to draw a line of best fit on a scatter plot and to use this line to make estimates from the graph. The final topic that was demonstrated in the lesson was how to use the TI-83 calculator to produce a scatter plot and how to graph a line of best fit by using linear regression.
Points to Consider
- Can any of these graphs be used for comparing data?
- Can the equation for the line of best fit be used to calculate values?
- Is any other graphical representation of data used for estimations?