# Sampling and Bias

## Sampling issues and bias

Undercoverage

In 1936, a well-known and highly respected magazine called the Literary Digest announced the result of the poll it had conducted on who would be elected president.  During prior election years, the magazine had demonstrated remarkable accuracy in predicting the election winner.  This time, the magazine predicted that Republican Alfred Landon, Governor of Kansas, would win by a wide margin (57% v.s. 43%) over the incumbent Democrat, President Franklin D. Roosevelt.

Unfortunately for the Literary Digest, when the results of the actual election came in, Roosevelt was the victor by a landslide: 62% vs. 38%!  Obviously there was a serious problem with the poll conducted by Literary Digest, given that the margin of error was an unheard-of nearly 20%.  The irony is that the poll was also one of the most ambitious surveys of the type ever conducted.  Nearly 10 million people chosen from telephone books, club memberships, magazine subscriptions and other resources had been mailed the survey card, and approximately 2.5 million people responded.

The error was almost entirely due to sample bias, specifically undercoverage of the less-wealthy democratic segments of the population.  What caused the bias, and how could the magazine have improved the accuracy of their poll?

After we discuss undercoverage and self-selection bias, and work a few examples, we will return to this question.  Can you figure out the answer on your own before then?

### Sample Bias

There are many different types of sample bias, any of which can skew the results of an experiment or survey. Undercoverage is one common type, referring to a sample with too few examples of one or more segments of the population it is meant to represent.  In some cases, particularly where the under-represented group is quite small in comparison to the others in the entire population, undercoverage may not have much of an effect.  However, if the undercovered segment is significant enough, the results of the sample may not accurately estimate the characteristic of the population.

Self-selection is related to undercoverage, and can actually be the cause of it.  Self-selection refers to the policy of asking voters to submit responses on their own, rather than collecting the answers from them.  The problem with self-selection is it limits the voters to those with the time and inclination to respond (known as non-response bias), which reduces the overall sample size, and also skews it toward the type of person who believes in the value of taking time to respond to polls!

#### Understanding Bias

You are assisting with a study attempting to determine the satisfaction of school communication with students who speak a second language at home.  The plan is to send home a questionnaire to the parents of the students, asking them about their opinion.

What kind(s) of bias is this survey method particularly prone to?  How might they be addressed?

This method of sampling is liable to result in both non-response and undercoverage bias.  Non-response bias is an issue any time a sample population is expected to submit a questionnaire, as your results are going to include more input from the type of person who is willing and able to complete and submit your survey.  In this case, undercoverage is a particular problem, since the population most affected by the study is also unusually liable to misinterpret the questions or the reason for them due to the language barrier.

One possible solution might be to conduct a phone survey conducted by a native speaker in the target language(s).

#### Recognizing Types of Bias

What type(s) of bias do theexperiments below suggest?

a. An experiment to determine the danger of mixing household chemicals is conducted by collecting samples of chemicals found under the experimenter’s sink.

Under coverage bias – This experiment is a prime example of the problems associated with convenience samplingsince the only chemicals used were the ones conveniently found in one location, the results could not be assumed to be the same as with chemicals found under other sinks.

b. Mall shoppers are asked to fill out and return a form rating their shopping experiences at each of the 26 stores to identify the most popular stores in each of 4 categories.

Non- response bias  – Since the results are dependent on the shoppers turning in a response form on their own, the results will be biased toward a specific type of personality, and will not reflect a true cross-section of shoppers' experiences.

c. A study of the average grades of mathematics students polls 16 Algebra I students, 14 Geometry students, 7 Calculus students, and 19 Statistics students.

Undercoverage – The study only includes approximately 12\begin{align*}\frac{1}{2}\end{align*} as many Calculus students as the other subjects.

#### Identifying Bias

There is a commonly referenced story about the difficulties of marketing products internationally, related to the Chevy Nova automobile.  According to the story, the Chevrolet motor company lost millions over an attempt to sell the popular U.S. vehicle in Mexico without noting that “No-Va” means “No-Go” in Spanish!

The truth is that the story is just an urban myth, and that the Nova sold well in Latin America, but the caution is valid nonetheless.  If the situation had occurred as described, what sort of bias might have been the culprit in Chevy’s market research that could have led to the misunderstanding?

It is certainly reasonable to suspect that undercoverage might have been a contributing factor here.  Any studies or market research that Chevy conducted in the United States about the popularity of the name “Nova” would have included far more native English speakers than Spanish speakers.

#### Earlier Problem Revisited

In 1936, the Literary Digest predicted that Republican Alfred Landon, Governor of Kansas, would win the presidential race by a wide margin (57% v.s. 43%) over the incumbent Democrat, President Franklin D. Roosevelt.When the results of the actual election came in, Roosevelt was the victor by a landslide: 62% v.s. 38%!The error was almost entirely due to sample bias, specifically undercoverage of the less-wealthy democratic segments of the population.

What caused the bias, and how could the magazine have improved the accuracy of their poll?

The bias was caused by the magazine’s method of sampling.  Choosing the voters by telephone listing (remember that phones were much more of a luxury in 1936!), club membership, and magazine subscribers resulted in a bias toward the wealthier members of the population.  Perhaps a door-to-door poll in some of the lower-income areas of the country would have provided some valuable insight.  At a minimum, the magazine could have at least issued a statement regarding the possible bias in the survey due to the limited range of incomes targeted.

Ironically, the uncommonly large size of the sample actually made the bias worse, since there was a huge number of responses from the wealthier demographic, overshadowing the limited number of other responses.  Had the study been a bit more limited in size, the fewer other responses might not have been so drastically outnumbered, particularly if the smaller study were conducted in a more balanced area.

### Examples

#### Example 1

If a sample of 100 high school students indicated that 78% thought the most important class in a high school curriculum was “Woodworking”, what might you suspect about the chosen sample?

It would certainly appear that the sample was not a likely cross section of the average public school. It is a good bet that the female population was undercovered during the sample selection process.

#### Example 2

If a study posted results indicating that only 1% of polled students liked football, what bias is likely to have affected the sample selection?

Obviously the athletic students were undercovered in this sample. Maybe this study was conducted using the students who weren;t polled during the study referenced in question 1!

#### Example 3

Suppose “Super-Sugar” cola company indicated that every person polled who preferred “Super-Sugar Cola” over all other brands of soda was a multi-millionaire. What type(s) of sample selection bias would you suspect that might prevent you from running right out to buy a case of “Super-Sugar” so you could become a multi-millionaire?

This is an example of “cherry-picking”, a sampling technique where only very specific people are polled to insure a particular appearance for the results. If “Super-Sugar Cola” only sampled multi-millionaires, then any person who preferred their drink would be a multi-millionaire. Obviously this method would also create an undercoverage bias, since the less-wealthy soda drinkers were not included in the sample.

### Review

Discuss how undercoverage could be a source of bias in each of the following surveys:

1. A poll showed that 85% of respondents believe that teens make better drivers than adults.

2. The U.S. census of 1980 states that 32,194 Americans are 100 years old or older. However, Social Security figures show only 15, 258 adults of this advanced age (Los Angeles Times, Dec. 4, 1983)

3. In a census in Russia, 1.4 million more women than men reported that they were married (U.S. News & World Report, Aug. 30, 1976).

4. To find out how important the clothes of vice-presidential candidate might be, researchers ran a survey shortly after the 1984 Democratic convention in three locations: the Wall Street area of New York City, State Street in Chicago, and Crown Center in downtown Kansas City. The 347 respondents were shown pictures of women wearing three outfits, and the pictures did not show the women's faces. Then the respondents were asked several questions about how the outfits affected respondents’ feelings of competence regarding the model serving in a public office (Los Angeles Times, Aug., 3, 1984). 310 respondents indicated that the color and fit of the outfit was important in creating feelings of competence.

5. One year after the Detroit race riots of 1967, interviewers asked a sample of residents in Detroit if they felt they could trust most of their neighbors, some of their neighbors, or none at all. In one sample, 35% answered “most”; in another sample, only 7% answered “most”.

6. In a comment on deregulation of banking, “[the head of California's Security Pacific Bank] reckons the higher interest accounts, and all the other new financial services, are designed for the most affluent 15% to 20% of Security Pacific Bank's customers. By extension--as 2million customers are surely a sample of the general population--the new world of deregulated finance benefits the top-earning 15% to 20% of U.S. households” (Los Angeles Times, Dec. 4, 1983).

In the following scenarios, identify if we are dealing with a sampling or a nonsampling error. In each case, be as specific as possible about the source of error. Would this type of error result in bias?

7. In a telephone survey that randomly selects participants, we try to contact a person five times and he/she never picks up the phone.

8. An interviewer chooses people on the street to interview regarding their preference for walking v.s. driving.

9. The police department of Lexington would like to know more about people’s opinion about their police force. They send an officer in uniform to randomly selected households, but many of the selected households refuse to participate.

10. A survey asks the question “Do you agree with the U.S. Supreme Court’s decision that corporations are allowed to spend huge amounts of money to sway elections in their favor?”

11. In a survey that would like to measure the overall health of college students, including the prevalence of sexually transmitted diseases, some participants are not willing to admit that they have contracted such a disease.

12. In Fayette County, 53.8% of registered voters are registered as Democrats. However, in a SRS of 200 registered voters, only 45% of them are registered as Democrats.

13. An interviewer enters all the information into a database during the interview, and accidentally records that a person has 22 children, instead of 2.

### Vocabulary Language: English

bias

Bias refers to a desire to achieve a specific result from a particular study, regardless of the data.

census

A census is an official enumeration of the entire population, with details as to age, sex, occupation, etc.

convenience sampling

Convenience sampling refers to the process of choosing a sample based on members who are easily accessible.

incorrect response bias

When an individual intentionally responds to a survey with an untruthful answer, this is called incorrect response bias.

incorrect sampling frame

Incorrect sampling frame occurs when the group from which you choose your sample does not include everyone in the population, or at least units that reflect the full diversity of the population.

judgement sampling

Judgment Sampling is a type of sampling occurs when the investigator already has made an assumption about a characteristic of the population, and samples are selected accordingly.

margin of error

The margin of error is found by multiplying the standard error of the mean by the z-score of the percent confidence level

non-response bias

Non-response bias is commonly caused by self-selection, subjects with a reason not to respond which may be unrelated to the actual study are not included, skewing the results.

questionnaire bias

Questionnaire bias occurs when the way in which the question is asked influences the response given by the individual.

Sample

A sample is a specified part of a population, intended to represent the population as a whole.

Sampling error (random variation)

Sampling error occurs whenever a sample is used instead of the entire population, where we have to accept that our results are merely estimates, and therefore, have some chance of being incorrect.

self-selection

Self-selection is a sampling method that requires the subject to offer a response to an input.

undercoverage

Undercoverage describes a sample with too few members of a given group or demographic.

voluntary response bias

Voluntary response bias occurs when sample members are self-selected volunteers