Methods for Reducing Bias in Sampling
If your statistics teacher wants to choose a student at random for a special prize, he or she could simply place the names of all the students in the class in a hat, mix them up, and choose one. More scientifically, your teacher could assign each student in the class a number from 1 to 25 (assuming there are 25 students in the class) and then use a computer or calculator to generate a random number to choose one student. This would be a simple random sample of size 1.
Cluster sampling is when a naturally occurring group is selected at random, and then either all of that group, or randomly selected individuals from that group, are used for the sample. If we select at random from out of that group, or cluster into smaller subgroups, this is referred to as multi-stage sampling.
To survey student opinions or study their performance, we could choose 5 schools at random from your state and then use an SRS (simple random sample) from each school. If we wanted a national survey of urban schools, we might first choose 5 major urban areas from around the country at random, and then select 5 schools at random from each of these cities. This would be both cluster and multi-stage sampling. Cluster sampling is often done by selecting a particular block or street at random from within a town or city. It is also used at large public gatherings or rallies. If officials take a picture of a small, representative area of the crowd and count the individuals in just that area, they can use that count to estimate the total crowd in attendance.
In stratified sampling, the population is divided into groups, called strata (the singular term is 'stratum'), that have some meaningful relationship. Very often, groups in a population that are similar may respond differently to a survey. In order to help reflect the population, we stratify to insure that each opinion is represented in the sample.
We often stratify by gender or race in order to make sure that the often divergent views of these different groups are represented. In a survey of high school students, we might choose to stratify by school to be sure that the opinions of different communities are included. If each school has an approximately equal number of students, then we could simply choose to take an SRS of size 25 from each school. If the numbers in each stratum are different, then it would be more appropriate to choose a fixed sample (100 students, for example) from each school and take a number from each school proportionate to the total school size.
Technology Note: Generating Random Numbers on the TI-83/84 Calculator
Your graphing calculator has a random number generator. Press [MATH] and move over to the PRB menu, which stands for probability. (Note: Instead of pressing the right arrow three times, you can just use the left arrow once!) Choose '1:rand' for the random number generator and press [ENTER] twice to produce a random number between 0 and 1. Press [ENTER] a few more times to see more results.
It is important that you understand that there is no such thing as true randomness, especially on a calculator or computer. When you choose the 'rand' function, the calculator has been programmed to return a ten digit decimal that, using a very complicated mathematical formula, simulates randomness. Each digit, in theory, is equally likely to occur in any of the individual decimal places. What this means in practice is that if you had the patience (and the time!) to generate a million of these on your calculator and keep track of the frequencies in a table, you would find there would be an approximately equal number of each digit. However, two brand-new calculators will give the exact same sequences of random numbers! This is because the function that simulates randomness has to start at some number, called a seed value. All the calculators are programmed from the factory (or when the memory is reset) to use a seed value of zero. If you want to be sure that your sequence of random digits is different from everyone else’s, you need to seed your random number function using a number different from theirs. Type a unique sequence of digits on the home screen, press [STO], enter the 'rand' function, and press [ENTER]. As long as the number you chose to seed the function is different from everyone else's, you will get different results.
Now, back to our example. If we want to choose a student at random between 1 and 25, we need to generate a random integer between 1 and 25. To do this, press [MATH][PRB] and choose the 'randInt(' function.
The syntax for this command is as follows:
'RandInt(starting value, ending value, number of random integers)'
The default for the last field is 1, so if you only need a single random digit, you can enter the following:
In this example, the student chosen would be student number 7. If we wanted to choose 5 students at random, we could enter the command shown below:
However, because the probability of any digit being chosen each time is independent from all other times, it is possible that the same student could get chosen twice, as student number 10 did in our example.
What we can do in this case is ignore any repeated digits. Since student number 10 has already been chosen, we will ignore the second 10. Press [ENTER] again to generate 5 new random numbers, and choose the first one that is not in your original set.
In this example, student number 4 has also already been chosen, so we would select student number 14 as our fifth student.
In San Francisco, there are 5 Math Circle math clubs, each with a different number of students. If we wanted to do a study to determine whether the students in these clubs improve the students' math perform, how would you design the study to reduce bias?
If you did a SRS of all students, you might get many students from one club. This might bias your results, depending on how different the clubs are from each other. In order to avoid bias from the differences of the clubs, you should take a stratified random sample of students, where the clubs are the strata. If one club has one tenth of the students in the total population of students in all math clubs, then approximately one tenth of your sample should come from that club.
For questions 1-5, an amusement park wants to know if its new ride, The Pukeinator, is too scary. Explain the type(s) of bias most evident in each sampling technique and/or what sampling method is most evident. Be sure to justify your choice.
- The first 30 riders on a particular day are asked their opinions of the ride.
- The name of a color is selected at random, and only riders wearing that particular color are asked their opinion of the ride.
- A flier is passed out inviting interested riders to complete a survey about the ride at 5 pm that evening.
12thteenager exiting the ride is asked in front of his friends: “You didn’t think that ride was scary, did you?”
- Five riders are selected at random during each hour of the day, from 9 AM until closing at 5 PM.
For 6-10, There are 35 students taking statistics in your school, and you want to choose 10 of them for a survey about their impressions of the course. Assume the students are assigned numbers from 1 to 35, decide which students are chosen for the sample. Use your calculator to select a simple random sample of the size specified. Make sure to start with a different random seed each time.
- A SRS of 10 students. (Seed your random number generator with the number 10 before starting.)
- A SRS of 6 students. (Seed your random number generator with a different number before starting.)
- A SRS of 5 students. (Seed your random number generator with a different number before starting.)
- A SRS of 11 students. (Seed your random number generator with a different number before starting.)
- A SRS of 3 students. (Seed your random number generator with a different number before starting.)
The New York Times
To view the Review answers, open this PDF file and look for section 6.2.