10.7: Collecting Data
Introduction
The Team Tally
A group of boys decided to practice their times for the 400 meters one day after school. Five of the boys on the track team gathered together and all decided to practice. They figured out that the best way to do it was to have each person start in turns. One person would run, then the next and then the next until everyone had run. Carla came too so that she could run the stop watch and record the times.
All of the boys ran and at the end of it they asked Carla for the times. Although five boys ran, she had only recorded the scores of three boys.
“Why did you do that?” Marco asked.
“I only recorded the first three to cross the finish line,” Carla explained. “At the race, those will be the three winners.”
“But that isn’t accurate?” Marco explained.
Who is correct? In this lesson you will learn about collecting data. By the end of the lesson, you will be able to determine and create an argument that supports Carla or Marco as correct.
What You Will Learn
By the end of this lesson you will be able to complete the following skills.
- Classify sampling methods as random, stratified random, systematic, convenience and self-selected.
- Identify and discuss potential bias in sampling methods.
- Identify and compare potential bias in survey questions.
- Compare, analyze and interpret population survey data and sample survey data.
Teaching Time
I. Classify Sampling Methods as Random, Stratified Random, Systematic, Convenience and Self – Selected
A common tool that people use to gather information about a population is a survey. A population can refer to any group about which information is desired. In other words, if you want to know about the opinions of students in your school, then the population is the entire school. The population could be an entire city, country, or even a group of plants, animals, or things. As you can imagine, a population can be very large. For this reason, it is oftentimes impractical or even impossible to get information on every single member of the population. In order to save time and resources, we may survey only a small percentage of the population. In order to make sure that the information we gather from a small percentage is representative of the entire population, we use techniques to choose the members that we survey.
Of course, sometimes people hope to see a certain outcome. They desire to prove their hypothesis correct for one reason or another. For this reason, we have to watch out for potential bias in a survey. When surveys are carried out appropriately, the information that they tell us can be most useful.
Let’s think about an example.
Example
Every year at Monica’s middle school, a vote for the student with the best smile takes place. This student is honored by having his or her face on the front of the yearbook. Monica wants to win. She’s competing again four other students who are in different classes: Winthrop, Penelope, Giovanna, and Oscar. A week before the official vote, she wants to take a survey to see for whom other students will be voting. There are 2000 students in her school, though, and she knows that she cannot ask everyone. She’ll have to take a sample, a small part of the population, and assume that the information they give her is true for the rest of the school. She plans to ask 50 people. In order to choose the 50 people that she asks, she can use a variety of methods.
To help Monica, let’s think about the different types of samples that can be collected.
Sampling Methods
Random Sampling: Monica can take a random sample by giving each student in the school an equal opportunity of being chosen. Like a lottery machine that uses ping-pong balls to choose winning numbers, Monica would have to select students completely randomly. For example, if she had a list of students, she could mix up the names and choose the first 50 with her eyes closed.
Stratified Random: In this method Monica would make sure that she selected an equal number of students in certain strata. Strata means layers or levels. In her school, students are in levels by grade. In this case, strata can also refer to gender and educational program. If she randomly chooses 25 girls and 25 boys, she is using a stratified random method.
Systematic Sampling: Monica may also decide to sample students in a systematic manner. That is, she could stand at the front of the school or in the lunch line and sample every student. This way, she would get approximately 50 students, assuming that they all attend school on the given day and they all come through the same door or go to lunch. If they do not, then she may not get a representative sample; the sample would be biased.
Convenience Sampling: If Monica finds these methods too difficult, she could go with an easier route—ask the first 50 students that walk by her homeroom, for example. This is called convenience sampling; it is the method that is easiest because it selects the members of the population to whom the sampler has the most access. The problem is that this method does not ensure a representative portion of the entire population. If she is in advanced biology, for example, and asks only students in that room or only students who walk by that room, they may have different views than students in the rest of the school. Perhaps the fact that they share a class or grade level with her will make them more likely to vote for her. That may not be true of the rest of the school.
Self-Selected: There are many people who like to be asked. They like that their opinion counts, they like to express their points-of-view, they like to participate, they like to help, or they like to influence an outcome. For whatever reason, some people may choose to participate if you give them the chance. If Monica walks around with a sign that says, “Tell me who you’re voting for,” she allows members of the population to select themselves. You may have seen taste-tests in local malls, for example, that have signs reading: “Choose your favorite soda.” People who participate are self-selecting. As with convenience sampling, this may not be a representative portion of the population. People who do not select to participate may have different views from those that do participate.
Write each type of sampling down in your notebook with a definition of each.
II. Identify and Discuss Potential Bias in Sampling Methods
A key concern in sampling methods is getting a representative group of the entire population. This gives the study validity; it helps us to believe what the results of the study say. However, many sampling methods are biased—they give an unfair preference to a certain group or exclude a certain segment of the population—which gives us less confidence in the results.
Any sampling method that favors one group or gives a group a smaller likelihood of participating is biased. Let’s look at an example.
Example
A school polls parents about traffic congestion in the morning. They ask the parents of every 3rd car in the school drop-off area before school bell rings to rate the traffic.
Let’s think about this method of sampling. Is it the best one? Is it biased?
If there is traffic congestion, some people may be arriving late. This sampling method is biased because it only includes people who arrive on time. Their opinions may be different from those that arrive late. Therefore, this is not the best way to gather a sampling.
Example
A biologist measures the growth of plants but only samples plants near the entrance because she cannot reach plants in the middle of the greenhouse.
The plant growth may not have been the same near the door as in other parts of the greenhouse. She used convenience sampling which is not always the best choice for a sample.
III. Identify and Compare Potential Bias in Survey Questions
What is bias? Bias is when one group of people is targeted more than another group. This provides only a specific view of the situation. Survey questions can reveal bias in the survey itself. Sometimes the people who create surveys hope for certain results and create questions to steer the answers. At other times, there are inadvertent cultural biases based on religion, language, age, economic level, etc. We can learn to spot potential bias in survey questions by looking for questions that exclude a particular group or only include specific groups.
Can you find bias in the following questions?
Question | Possible Bias |
---|---|
1. When you visited the restroom, was the cleanliness a) bad, b) okay, c) good? | There is the assumption that the person visited the bathroom. |
2. At what time of the day do you usually use your swimming pool? | There is the assumption that the person has a swimming pool. |
3. Which do you think is the most powerful book in the Bible? | There is the assumption that people belong to a certain religion or knows about the Bible. |
4.What was more important to the history of America, the Emancipation Proclamation or Women’s Suffrage? | There is the assumption that people are familiar with these issues from U.S. history and that they understand the words. |
5.Do you think you should go to church every Sunday? | There is the assumption that people believe in these ideas. |
If a person taking a survey does not feel like the options available for a question do not accurately represent his or her true response, a bias in the survey has occurred. They may feel confused or frustrated. In some cases, they may not even understand the meaning of the question because their education or background did not prepare them or they don’t even speak the language in which the survey is written.
Finally, some people may not be willing to tell the truth, for one reason or another. If a person is asked to identify themselves and then reveal confidential or personal information, they may not answer truthfully. They may not even take the survey seriously and not answer in a sincere manner.
IV. Compare, Analyze and Interpret Population Survey Data and Sample Survey Data
When a survey is complete, there is still a lot of work to do. As you have seen in previous lessons, data can be analyzed using a great many choices of displays and statistical measures. From these data analyses, we hope to make some generalizations about the population at large. We also hope, at times, to make decisions based on the data.
Example
A number of children were surveyed about the amount of time that they watch TV and the amount of time that they spend studying. The study was completed at a charter school that specializes in college preparation for first-generation Americans. Three students from each class were randomly chosen to participate in the survey. Their results are shown in the table below:
TV | 3.5 | 3.5 | 3.5 | 5 | 3 | 1 | 1 | 0 | 0 |
---|---|---|---|---|---|---|---|---|---|
Studying | 2 | 1.5 | 2.5 | 1 | 3.5 | 4.5 | 5 | 5 | 1 |
TV | 1 | 2 | 2 | 2 | 1.5 | 0 | 0.5 | 4 | 4 |
Studying | 3.5 | 7 | 6 | 5.5 | 5 | 6 | 4 | 0.5 | 1 |
TV | 4 | 6 | 3 | 4 | 3 | 6 | 6.5 | 1 | 1 |
Studying | 1.5 | 1 | 4 | 2 | 4.5 | 0 | 0.5 | 7 | 1 |
Clearly, this data is difficult to interpret in this form. Because the school is looking for a relationship between TV time and studying time, a scatterplot is an excellent display of the data.
They drew the following conclusions:
- Montoya Charter School students who watch more than 2 hours of TV do not study.
- Children in the United States watch too much TV.
- Students who do not study enough will get low grades.
- TV is causing students to be less interested in school.
What is wrong with their conclusions based on the data?
- The data shows that many students who watch more than 2 hours of TV do study although generally fewer hours.
- This sample was only taken at a charter school that serves a specific population. You cannot generalize this data to other populations like the entire United States.
- There is no data in this study that relates studying time to grades.
- A scatterplot does not imply causation, only correlation; the variables are shown to have a negative relationship, but that does not mean that one causes the other. If you take television time away from students, it does not mean that they will necessarily study more nor be more interested in school.
Now let’s go back to the problem from the introduction.
Real-Life Example Completed
The Team Tally
Here is the problem from the introduction. Reread it and then write whether Carla or Marco is correct and why.
A group of boys decided to practice their times for the 400 meters one day after school. Five of the boys on the track team gathered together and all decided to practice. They figured out that the best way to do it was to have each person start in turns. One person would run, then the next and then the next until everyone had run. Carla came too so that she could run the stop watch and record the times.
All of the boys ran and at the end of it they asked Carla for the times. Although five boys ran, she had only recorded the scores of three boys.
“Why did you do that?” Marco asked.
“I only recorded the first three to cross the finish line,” Carla explained. “At the race, those will be the three winners.”
“But that isn’t accurate?” Marco explained.
Be sure to explain why in your answer.
Solution to Real – Life Example
Marco is accurate on this one. The boys all started at a different time. Therefore, the order that they crossed the finish line does not help in determining who was fastest. You have to calculate each time to figure this out. Time, not order is what makes the difference here.
Vocabulary
Here are the vocabulary words that are found in this lesson.
- Survey
- a method of gathering information about a population.
- Random Sampling
- everyone has an equal chance of being chosen because there isn’t a specific method by which the information is gathered.
- Stratified Sampling
- An equal number is selected from each level.
- Systematic Sampling
- There is a system that has been developed for gathering each sample.
- Convenience Sampling
- a sampling is collected based on first or last.
- Self-Selected Sampling
- Sampling people who like being asked.
Time to Practice
Directions: Define each of the following terms.
- Survey
- Random sample
- Stratified sample
- Systematic sample
- Convenience sample
- Self Selected sample
Directions: Match the sampling method with the example.
- A mother asks everyone in her office-building about the best restaurant in town. - a. Random Sampling
- A police traffic stop pulls over every car to check for proper insurance. - b.Stratified Sampling
- A phone company uses a computer to choose customer for a satisfaction survey. 5% of each region are chosen randomly. - c. Systematic Sampling
- People call a phone number given on a receipt at a restaurant to answer questions about cleanliness and service. Each person who calls gets entered into a drawing. - d. Convenience Sampling
- A cattle herder checks for Mad Cow Disease by drawing blood from 30 cows that he chose by drawing their ID numbers from a hat. - e. Self-Selected Sampling
- For each of the sampling methods in numbers 1-5 above, would you consider them biased or unbiased? Explain your answer.
Directions: Why are the following survey questions biased?
- How old is your spouse?
- How many times do you go to the park each month? a) 1-2, b) 3-5, c) 6-10, or d) more than 10
- Which is more important, the First or Second Amendment of the Constitution?
- Don’t you agree that equality for all Americans is important?
- Use the following data to create a data display. Why did you choose this data display?
- Then, analyze the data. What tendency do you see?
- What conclusions can be drawn? What further research could be done on this topic? A stratified random survey is conducted regarding the best city for the next Winter Olympics. The results of 500 surveys are as shown below:
Beijing | Chicago | Buenos Aires | Paris | |
---|---|---|---|---|
Women | 15% | 12% | 10% | 63% |
Men | 13% | 53% | 12% | 22% |