Have you ever had a challenge trying on clothes? Well, take a look at this dilemma.
“I can’t believe it!” Jacob exclaimed trying on his new long sleeved team shirt for the track team.
“What’s the matter?” his friend Mattias asked.
“This shirt doesn’t fit and this always happens to me. I am going to figure out why!” Jacob said taking off the shirt where the sleeves were too short once again.
After Jacob’s anger had subsided, he started to think about this question. Was he the only one with this problem? Jacob decided to find out by measuring his peers’ heights and arm lengths. He used inches and create a table like this:
Now that Jacob has his data done, he needs to create a display. Which one should he create? Think about this question throughout this Concept and in the end, you will help Jacob create the appropriate display for his data.
Data can exist in many forms. A frequent goal of collecting data is drawing conclusions based on the data. The best conclusions correspond with trends that the data shows. Depending on the data you have, certain types of displays are more appropriate or more effective than others. We must make good choices of displaying data in a logical way. Of course, in a world so full of data, it must be collected and organized carefully to aid in appropriate decision-making.
Sometimes two people look at the same graph and draw completely different conclusions. Graphs can show us many things but the conclusions that we draw based upon the graphs is oftentimes more a matter of opinion. The idea of graphs is, in part, to make inferences. Those inferences must be based on the data.
Take a look at this situation.
Some scientists from the EPA were studying the amount of dissolved oxygen in a lake over several weeks. This graph was created by the data they found.
They studied the graph and came up with the following conclusions:
- The amount of dissolved oxygen fluctuated over the 5 weeks.
- The average amount of dissolved oxygen has been about 110 parts per million over 5 weeks.
- The dissolved oxygen in Week 6 will be about 60 parts per million.
Do you agree with their conclusions?
The first two conclusions are clearly shown by the data. However, the prediction about Week 6, conclusion number 3, is not convincingly shown. The level of dissolved oxygen does seem to fluctuate and has gone slightly higher over time and then lower, but it is not enough evidence to be sure that the Week 6 dissolved oxygen will go even lower.
Here is another one.
The boss at an office took a survey of people’s preference for lunch because he wanted to treat the office to a lunch for the holidays. His data is shown below.
He ponders the following conclusions:
- A lot of people like Chinese food.
- Nobody likes Italian food.
- If I order sandwiches, then 80% of the staff will be unhappy.
- If I get some pizzas and some Chinese food, the majority will have their preference.
Do you agree with his conclusions?
According to the graph, Conclusion 1 is supported because the largest number of people selected Chinese food.
The graph does not support Conclusion 2. Perhaps Italian food was not an option on the survey. Also, just because you have a preference does not mean you don’t like the other choices.
For the same reason, Conclusion 3 is not supported, either. Just because you may prefer Chinese food does not mean you do not like sandwiches.
Conclusion 4 is supported because 25% prefer sandwiches and 35% prefer Chinese food so 60%, a majority, will have their preference.
There are many ways to display data so how do you know which is the best way to display given data? Some choices are simply preferential but most types of data have types of displays that suit them best.
Types of Data
Two major types of data are categorical data and numerical data. Categorical data refers to data to which the independent variable is assigned a name, not a number. For example, you may take data based on the months May, June, July, and August or you may tally people based on males and females. Sometimes categories can be numbers that are used to name the categories. For example, players on a team are given numbers on their shirts. Those numbers are only used to clarify who is who. It would not make sense to use mathematical operations with the numbers. Generally, categorical data is simply tallied.
The second type of data is numerical. Numerical data measures some characteristic of the variable. Examples of data that is measured numerically are time, height, weight, length, volume, density, force, etc. Anything that can be measured with a numerical system is numerical data.
Types of Displays
We can use different data displays depending on the data. We can use line graphs, scatterplots, circle graphs, bar graphs, stem-and-leaf plots, box-and whisker plots, and histograms.
There are many more types of displays of data, but let’s stick with these for now. Although there are few exact rules about data displays, each type of display has certain instances for which it is ideal. Also, there are instances where certain displays are inappropriate. Furthermore, the best display of data depends on what information you hope to get from it.
Line graphs are generally used to show change over time.
Scatterplots are used to show a trend or a relationship (correlation) between to variables.
Circle graphs are best to show data that represents one whole or one hundred percent of something.
Bar graphs are excellent for categorical data.
Stem-and-leaf plots are useful to represent ranges and can be used to illustrate ranges of two variables.
Box-and-whisker plots are used to show how spread out data is and where the bulk of the data lies.
Write down each example of a data display and the best use for each.
Answer each question about different data displays.
Which data display is best for categorical data?
Solution: Bar graph
Which data display is best for showing how data changes over time?
Solution: Line graph
If I had data that was in the 10's, 20's, 40's, 50's and 60's, which display would be best?
Now let's go back to the dilemma from the beginning of the Concept.
By using a scatterplot, Jacob can compare the two variables, which are both numerical data, at once to see if there is a relationship. Here are the results.
Jacob’s measurements were (62, 27). It looks like his measurements are slightly different than the normal student. For this reason, your shirts don’t seem to fit quite right.
- Numerical Data
- Any data that is measured in numbers.
- Categorical Data
- Data that is assigned a name and not a number.
Here is one for you to try on your own.
A tally of the animals at a local shelter was taken so that children visiting on a field trip could see. Here are the results.
When children looked at the bar graph, they shouted out :
Bobby: “Nobody likes dogs!”
Lisa: “I didn’t know that hamsters and rats were the same thing!”
Miguel: “Everyone must have taken all the turtles!”
Mona: “They must have mostly food for dogs and cats!”
How could a teacher respond? Were the conclusions of the children accurate?
The teacher responded to their comments patiently:
“Bobby, just because they have a lot of dogs doesn’t mean people don’t like them. Since dogs are the most common pet, it makes sense that there would be more dogs at the shelter.”
“Lisa, it looks like hamsters and rats were tallied in the same category, maybe because they are kept in the same cage or given similar food. This bar graph does not mean they are the same, though.”
“Miguel, since turtles are a less common pet, the shelter probably has fewer turtles. It doesn’t mean that they had a lot that were taken already.”
“Yes, Mona. It does make sense that having so many dogs and cats compared to other animals requires much more food for them than for the other animals. Also, they are bigger animals than the others, generally, so they eat more, too. Don’t they?”
Once again, a data display was used to make connections. The children used the graph from the animal shelter to draw conclusions.
Directions: Answer each question about data displays.
- What is considered numerical data?
- What is considered categorical data?
- If you were looking for a relationship between two values, would you use a scatterplot or a line graph?
- If there was a relationship between the data would you have a positive correlation or a negative correlation?
- The words positive correlation and negative correlation are associated with which type of data display?
- If you had an outlier, then would you have a scatterplot or a box-and-whisker plot?
- What is a quartile?
- Which type of data display is a quartile associated with?
- If you were watching a trend over time would you use a line graph or a scatterplot?
- If you were comparing two trends and their results which data display would make the most sense?
- What is the mean?
- What is the median?
- What is the mode?
Directions: Answer each question.
Do the conclusions fit the graph? Explain your reasoning.
- Conclusion 1: Rats are the most feared creature. Conclusion 2: Rats are the most dangerous creature. Conclusion 3: Nobody is afraid of bats.
- Conclusion 1: Prices have increased every year for 10 years. Conclusion 2: Prices of gasoline increased more rapidly after 2000. Conclusion 3: Prices will be even higher in 2008.