Skip Navigation

6.2: Experimental Design

Difficulty Level: At Grade Created by: CK-12
Turn In

Learning Objectives

  • Identify the important characteristics of an experiment.
  • Distinguish between confounding and lurking variables.
  • Use a random number generator to randomly assign experimental units to treatment groups.
  • Identify experimental situations in which blocking is necessary or appropriate and create a blocking scheme for such experiments.
  • Identify experimental situations in which a matched pairs design is necessary or appropriate and explain how such a design could be implemented.
  • Identify the reasons for and the advantages of blind experiments.
  • Distinguish between correlation and causation.


A recent study published by the Royal Society of Britain\begin{align*}^1\end{align*} concluded that there is a relationship between the nutritional habits of mothers around the time of conception and the gender of their child. The study found that women who ate more calories and had a higher intake of essential nutrients and vitamins were more likely to conceive sons. As we learned in the first chapter, this study provides useful evidence of an association between these two variables, but it is an observational study. It is possible that there is another variable that is actually responsible for the gender differences observed. In order to be able to convincingly conclude that there is a cause and effect relationship between a mother’s diet and the gender of her child, we must perform a controlled statistical experiment. This lesson will cover the basic elements of designing a proper statistical experiment.

Confounding and Lurking Variables

In an observational study such as the Royal Society’s connecting gender and a mother’s diet, it is possible that there is a third variable that was not observed that is causing a change in both the explanatory and response variables. A variable that is not included in a study but may still have an effect on the other variables involved is called a lurking variable. For example, perhaps the mother’s exercise habits caused both her increased consumption of calories and her increased likelihood of having a male child. A slightly different type of additional variable is called a confounding variable. Confounding variables are those that are observed but it cannot be distinguished which one is actually causing the change in the response variable. This study also mentions the habit of skipping breakfast could possibly depress glucose levels and lead to a decreased chance of sustaining a viable male embryo. In an observational study, it is impossible to determine if it is nutritional habits in general, or the act of skipping breakfast that causes a change in ender birth rates. A well-designed statistical experiment has the potential to isolate the effects of these intertwined variables, but there is still no guarantee that we will ever be able to determine if one of these variables or some other factor causes a change in gender birth rate.

Observational studies, and the public’s appetite for finding simplified cause and effect relationships between easily observable factors are especially prone to confounding. The phrase often used by statisticians is that “Correlation (association) does not imply causation.” For example, another recent study published by the Norwegian Institute of Public Health\begin{align*}^2\end{align*} found that first time mothers who had a Caesarian section were less likely to have a second child. While the trauma associated with the procedure may cause some women to be more reluctant to have a second child, there is no medical consequence of a Caesarian section that directly causes a woman to be less able to have a child. The \begin{align*}600,000\end{align*} first time births over a \begin{align*}30\end{align*} year time span that were examined are so diverse and unique that there could be a number of underlying causes that might be contributing to this result.

Experiments: Treatments, Randomization, and Replication

There are three elements that are essential to any statistical experiment that can earn the title of a randomized clinical trial. The first is that a treatment must be imposed on the subjects of the experiment. In the example of the British study on gender, we would have to prescribe different diets to different women who were attempting to become pregnant, rather than simply observing or having them record the details of their diets during this time, as was done for the study. The next element is that the treatments imposed must be randomly assigned. Random assignment helps to eliminate other confounding variables. Just as randomization helps to create a representative sample in a survey, if we randomly assign treatments to the subjects we can increase the likelihood that the treatment groups are equally representative of the population. The other essential element of an experiment is replication. The conditions of a well-designed experiment will be able to be replicated by other researchers so the results can be independently confirmed.

To design an experiment similar to the British study, we would need to use valid sampling techniques to select a representative sample of women who were attempting to conceive (this might be difficult to accomplish!) The women might then be randomly assigned to one of three groups in which their diets would be strictly controlled. The first group would be required to skip breakfast and the second group would be put on a high calorie, nutrition-rich diet, and the third group would be put on a low calorie, low nutrition diet. This brings up some ethical concerns. An experiment that imposes a treatment which could cause direct harm to the subjects is morally objectionable, and should be avoided. Since skipping breakfast could actually harm the development of the child, it should not be part of an experiment.

It would be important to closely monitor the women for successful conception to be sure that once a viable embryo is established, the mother returns to a properly nutritious pre-natal diet. The gender of the child would eventually be determined and the results between the three groups would be compared for differences.


Let’s say that your statistics teacher read somewhere that classical music has a positive effect on learning. To impose a treatment in this scenario, she decides to have students listen to an MP3 layer very softly playing Mozart string quartets while they slept for a week prior to administering a unit test. To help minimize the possibility that some other unknown factor might influence student performance on the test, she randomly assigns the class into two groups of students. One group will listen to the music, the other group will not. When one of the treatment groups is actually withholding the treatment of interest, it is usually referred to as the control group. By randomly assigning subjects to these two groups, we can help improve the chances that each group is representative of the class as a whole.

Placebos and Blind Experiments

In medical studies, the treatment group is usually receiving some experimental medication or treatment that has the potential to offer a new cure or improvement for some medical condition. This would mean that the control group would not receive the treatment or medication. Many studies and experiments have shown that the expectations of participants can influence the outcomes. This is especially true in clinical medication studies in which participants who believe they are receiving a potentially promising new treatment tend to improve. To help minimize these expectations researchers usually will not tell participants in a medical study if they are receiving a new treatment. In order to help isolate the effects of personal expectations the control group is typically given a placebo (pronounce Pluh-see-bo). The placebo group would think they are receiving the new medication, but they would in fact be given medication with no active ingredient in it. Because neither group would know if they are receiving the treatment or the placebo, any change that might result from the expectation of treatment (this is called the placebo effect) should theoretically occur equally in both groups (provided they are randomly assigned). When the subjects in an experiment do not know which treatment they are receiving, it is called a blind experiment. For example, if you wanted to do an experiment to see if people preferred a brand name bottled water to a generic brand, you would most likely need to conceal the identity of the type of water. A participant might expect the brand name water to taste better than a generic brand, which would alter the results. Sometimes the expectations or prejudices of the researchers conducting the study could affect their ability to objectively report the results, or could cause them to unknowingly give clues to the subjects that would affect the results. To avoid this problem, it is possible to design the experiment so the researcher also does not know which individuals have been given the treatment or placebo. This is called a double-blind experiment. Because drug trials are often conducted, or funded by the companies that have a financial interest in the success of the drug, in an effort to avoid any appearance of influencing the results, double-blind experiments are considered the “gold standard” of medical research.


Blocking in an experiment serves a similar purpose to stratification in a survey. If we believe men and women might have different opinions about an issue, we must be sure those opinions are properly represented in the sample. The terminology comes from agriculture. In testing different yields for different varieties of crops, researchers would need to plant crops in large fields, or blocks, that could contain variations in conditions such as soil quality, sunlight exposure, and drainage. It is even possible that a crop’s position within a block could affect its yield. If there is a sub-group in the population that might respond differently to an imposed treatment, our results could be confounded. Let’s say we want to study the effects of listening to classical music on student success in statistics class. It is possible that boys and girls respond differently to the treatment. So if we were to design an experiment to investigate the effect of listening to classical music, we want to be sure that boys and girls were assigned equally to the treatment (listening to classical music) and the control group (not listening to classical music). This procedure would be referred to as blocking on gender. In this manner, any differences that may occur in boys and girls would occur equally under both conditions, and we would be more likely to be able to conclude that differences in student performance were due to the imposed treatment. In blocking, you should attempt to create blocks that are homogenous (the same) for the trait on which you are blocking.

For example, in your garden, you would like to know which of two varieties of tomato plants will have the best yield. There is room in your garden to plant four plants, two of each variety. Because the sun is coming predominately from one direction, it is possible that plants closer to the sun would perform better and shade the other plants. So it would be a good idea to block on sun exposure by creating two blocks, one sunny and one not.

You would randomly assign one plant from each variety to each block. Then within each block, randomly assign the variety to one of the two positions.

This type of design is called randomized block design.

Matched Pairs Design

A matched pairs design is a type of randomized block design in which there are two treatments to apply. For example, let’s say we were interested in the effectiveness of two different types of running shoes. We might search for volunteers among regular runners using the database of registered participants in a local distance run. After personal interviews, a sample of \begin{align*}50\end{align*} runners who run a similar distance and pace (average speed) on roadways on a regular basis is chosen. Because you feel that the weight of the runners will directly affect the life of the shoe, you decided to block on weight. In a matched pairs design, you could list the weights of all \begin{align*}50\end{align*} runners in order and then create \begin{align*}25\end{align*} matched pairs by grouping the weights two at a time. One runner would be randomly assigned shoe \begin{align*}A\end{align*} and the other would be given shoe \begin{align*}B\end{align*}. After a sufficient length of time, the amount of wear on the shoes would be compared.

In the previous example, there may be some potential confounding influences. Things such as running style, foot shape, height, or gender may also cause shoes to wear out too quickly or more slowly. It would be more effective to compare the wear of each shoe on each runner. This is a special type of matched pairs design in which each experimental unit becomes their own matched pair. Because the matched pair is in fact two different observations of the same subject, it is called a repeated measures design. Each runner would use shoe \begin{align*}A\end{align*} and shoe \begin{align*}B\end{align*} for equal periods of time and then the wear of the shoes for each individual would be compared. Randomization still could be important. Let’s say that we have each runner use each shoe type for a period of \begin{align*}3\;\mathrm{months}\end{align*}. It is possible that the weather during those three months could influence that amount of wear on the shoe. To minimize this, we would randomly assign half the subjects shoe \begin{align*}A\end{align*}, with the other half receiving shoe \begin{align*}B\end{align*} and then switch after the first \begin{align*}3\;\mathrm{months}\end{align*}.

Lesson Summary

The important elements of a statistical experiment are randomness, imposed treatments, and replication. These elements are the only effective method for establishing meaningful cause and effect relationships. An experiment attempts to isolate, or control other potential variables to may contribute to changes in the response variable. If these other variables are known quantities but are difficult, or impossible, to distinguish from the other explanatory variables, they are called confounding variables. If there is an additional explanatory variable affecting the response variable that was not considered in an experiment, it is called a lurking variable. A treatment is the term used to refer to a condition imposed on the subjects in an experiment. An experiment will have at least two treatments. When trying to test the effectiveness of a particular treatment, it is often effective to withhold applying that treatment to a group of randomly chosen subjects. This is called a control group. If the subjects are aware of the conditions of their treatment, they may have preconceived expectations that could affect the outcome. Especially in medical experiments, the psychological effect of believing you are receiving a potentially effective treatment can lead to different results. This phenomenon is called the placebo effect. When the participants in a clinical trial are led to believe they are receiving the new treatment, when in fact they are not, it is called a placebo. If the participants are not aware of the treatment they are receiving, it is called a blind experiment. When neither the participant nor the researcher are aware of which subjects are receiving the treatment and which subjects are receiving a placebo, it is called a double-blind experiment.

Blocking is a technique used to control the potential confounding of variables. It is similar to the idea of stratification in sampling. In a randomized block design, the researcher creates blocks of subjects that exhibit similar traits which might cause different responses to the treatment and then randomly assigns the different treatments within each block. A matched pairs design is a special type of design when there are two treatments. The researcher creates blocks of size two on some similar characteristic and then randomly assigns one subject from each pair to each treatment. Repeated measures designs are a special matched pairs experiment in which each subject becomes it’s own matched pair by applying both treatments and comparing the results.

Points to Consider

  1. What are some other ways that researchers design more complicated experiments?
  2. When one treatment seems to result in a notable difference, how do we know if that difference is statistically significant?
  3. How can the selection of samples for an experiment affect the validity of the conclusions?

Review Questions

  1. As part of an effort to study the effect of intelligence on survival mechanisms, scientists recently compared a group of fruit flies intentionally bred for intelligence along with the same species of ordinary flies. When released together in an environment with high competition for food, the ordinary flies survived by a significantly higher percentage than the intelligent flies.
    1. Identify the population of interest and the treatments.
    2. Based on the information given, is this an observational study or an experiment?
    3. Based on the information given in this problem, can you conclude definitively that intelligence decreases survival among animals?
  2. In order to find out which brand of cola students in your school prefer, you set up an experiment where each person will taste the two brands of cola and you will record their preference.
    1. How would you characterize the design of this study?
    2. If you poured each student a small cup from the original bottles, what threat might that pose to your results? Explain what you would do to avoid this problem and identify the statistical term for your solution.
    3. Let’s say that one of the two colas leaves a bitter after taste. What threat might this pose to your results? Explain how you could use randomness to solve this problem.
  3. You would like to know if the color of the ink used for a difficult math test affects the stress level of the test taker. The response variable you will use to measure stress is pulse rate. Half the students will be given a test with black ink, and the other half will be given the same test with red ink. Students will be told that this test will have a major impact on their grade in the class. At a point during the test, you will ask the students to stop for a moment and measure their pulse rate. You measure the at rest pulse rate of all the students in your class.

Here are those pulse rates in beats per minute:

Student Number At Rest Pulse Rate
1 \begin{align*}46\end{align*}
2 \begin{align*}72\end{align*}
3 \begin{align*}64\end{align*}
4 \begin{align*}66\end{align*}
5 \begin{align*}82\end{align*}
6 \begin{align*}44\end{align*}
7 \begin{align*}56\end{align*}
8 \begin{align*}76\end{align*}
9 \begin{align*}60\end{align*}
10 \begin{align*}62\end{align*}
11 \begin{align*}54\end{align*}
12 \begin{align*}76\end{align*}

\begin{align*}46, 72, 64, 66, 82, 44, 56, 76, 60, 62, 54, 76\end{align*}

(a) Using a matched pairs design, identify the students (by number) that you would place in each pair.

(b) Seed the random number generator on your calculator using \begin{align*}623\end{align*}

Use your calculator to randomly assign each student to a treatment. Explain how you made your assignments.

(c) Identify any potential lurking variables in this experiment.

(d) Explain how you could redesign this experiment as a repeated measures design?

  1. A recent British study was attempting to show that a high fat diet was effective in treating epilepsy in children. According to the New York Times, this involved, " \begin{align*}\ldots 145\end{align*} children ages \begin{align*}2\end{align*} to \begin{align*}16\end{align*} who had never tried the diet, who were having at least seven seizures a week and who had failed to respond to at least two anticonvulsant drugs."\begin{align*}^1\end{align*}
    1. What is the population in this example?
    2. One group began the diet right away, another group waited three months to start it. In the first group, \begin{align*}38\%\end{align*} of the children experienced a \begin{align*}50\%\end{align*} reduction in seizure rates, and in the second group, only \begin{align*}6\;\mathrm{percent}\end{align*} saw a similar reduction. What information would you need to be able to conclude that this was a valid experiment?
    3. Identify the treatment and control groups in this experiment.
    4. What conclusion could you make from the reported results of this experiment.
  2. Researchers want to know how chemically fertilized and treated grass compares to grass using only organic fertilizer. They also believe that the height at which the grass is cut will affect the growth of the lawn. To test this, grass will be cut at three different heights, \begin{align*}1\;\mathrm{inch}\end{align*}, \begin{align*}2\;\mathrm{inches}\end{align*}, and \begin{align*}4\;\mathrm{inches}\end{align*}. A lawn area of existing healthy grass will be divided up into plots for the experiment. Assume that the soil, sun, and drainage for the test areas is uniform. Explain how you would implement a randomized block design to test the different effects of fertilizer and grass height. Draw a diagram that shows the plots and the assigned treatments.

Review Answers

    1. The population is all fruit flies of this species. The treatment is breeding for intelligence. The other treatment is really a control group. The second group of flies were not bred for any special quality.
    2. By the strict definition, this is an observational study as the subjects (fruit flies) are not randomly assigned to the treatment. A group of fruit flies was selectively bred for intelligence.
    3. Because the treatments were not randomly assigned the results are susceptible to lurking variables. It is possible that some other trait not observed in the population of intelligent fruit flies led to their lower survival rate. It is also questionable to generalize the behavior of fruit flies to the larger population of all animals. We have no guarantee that other animals will not behave differently than fruit flies. Without reading the study completely, it is difficult to determine how many of these concerns were addressed by the scientists performing the study. You can read more at:


    1. This is a repeated measures design. Each student becomes their own matched pair as they are sampling both colas.
    2. Students may have a preconceived idea of which cola they prefer for many possible reasons. You could have the colas already poured into identical unmarked cups, or hide the label of the bottle. This would be an example of a blind experiment.
    3. It is possible that the taste of the first cola might affect the taste of the second. In general, the order in which they taste the colas could affect their perception in a number of ways. To control for this, we could randomly assign the order in which the colas are sampled. Assign one of the colas to be \begin{align*}1\end{align*} and the other to be \begin{align*}2\end{align*}, then use your calculator to choose \begin{align*}1\end{align*} or \begin{align*}2\end{align*} randomly for each subject. If the student is given the two cups and given the option of choosing which one to drink first, we could randomly assign the position of each cup (right or left).
  1. (a) Because students with lower pulses may react differently than students with higher pulses, we will block by pulse rate. Place the students in order from lowest to highest pulse rate, then take them two at a time.
Pair Number Students
1 \begin{align*}6, 1\end{align*}
2 \begin{align*}11,7\end{align*}
3 \begin{align*}9,10\end{align*}
4 \begin{align*}3,4\end{align*}
5 \begin{align*}2,8\end{align*}
6 \begin{align*}12,5\end{align*}

(b) The calculator would generate the following \begin{align*}6\end{align*} random ones and twos.

the order in which the students appear in the table as their number, the students could be assigned by placing the chosen student for each pair into treatment 1, and the remaining student to treatment 2:

\begin{align*}& \text{Treatment}\ 1\ (\text{black ink}) & & 6, 11, 9, 3, 8, 5\\ & \text{Treatment}\ 2\ (\text{red ink}) & & 1, 7, 10, 4, 2, 12\end{align*}

(c) It is possible that different students react to testing taking and other situations differently and it may not affect their pulse directly. Some students might be better test takers than others. The level of mathematics ability or previous success on the subject matter being tested could also affect the stress level. Perhaps amount of sleep, diet, and amount of exercise may also be lurking variables.

(d) A repeated measures design would help control for individual differences in pulse rate. Each student would have to take both a black ink and red ink test. A second test would have to be carefully designed that was similar to the first, but with different color ink. If you just gave the students the same test twice, their stress level might be significantly lower when they take it the second time.

    1. The population is children with epilepsy who have not responded to other traditional medications.
    2. We need assurances that the children were randomly assigned to the treatment and control groups.
    3. The treatment is starting on the high fat diet immediately, the control group is the group who started the diet \begin{align*}3\;\mathrm{months}\end{align*} later. Notice in this case, researchers did not completely withhold the treatment from the control group for ethical reasons. This treatment has already shown some effectiveness in non-clinical trials.
    4. We would conclude that the high fat diet is effective in treating seizures among children with epilepsy who do not respond to traditional medication.
  1. We will need at least \begin{align*}6\end{align*} blocks to impose the various treatments, which are: Organic fertilizer, \begin{align*}1\;\mathrm{inch}\end{align*} Chemical fertilizer, \begin{align*}1\;\mathrm{inch}\end{align*} Organic fertilizer, \begin{align*}2\;\mathrm{inches}\end{align*} Chemical fertilizer, \begin{align*}2\;\mathrm{inches}\end{align*} Organic fertilizer, \begin{align*}4\;\mathrm{inches}\end{align*} Chemical fertilizer, \begin{align*}4\;\mathrm{inches}\end{align*} Assign the plots numbers from \begin{align*}1\end{align*} to \begin{align*}6\end{align*}. \begin{align*}& 1 && 2 && 3 \\ & 4 && 5 && 6\end{align*} Then randomly generate a number from \begin{align*}1\end{align*} to \begin{align*}6\end{align*}, without replacement, until all six treatments are assigned to a plot. In this example, the random number generator was seeded with \begin{align*}625\end{align*}, repeated digits were ignored, and the assignments were as follows: fertilizer, \begin{align*}1\;\mathrm{inch}\end{align*} PLOT 6 Chemical fertilizer, \begin{align*}1\;\mathrm{inch}\end{align*} PLOT 2 Organic fertilizer, \begin{align*}2\;\mathrm{inches}\end{align*} PLOT 1 Chemical fertilizer, \begin{align*}2\;\mathrm{inches}\end{align*} PLOT 5 Organic fertilizer, \begin{align*}4\;\mathrm{inches}\end{align*} PLOT 4 Chemical fertilizer, \begin{align*}4\;\mathrm{inches}\end{align*} PLOT 3

Further reading:


Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes
Show More

Image Attributions

Show Hide Details
Files can only be attached to the latest version of section
Please wait...
Please wait...
Image Detail
Sizes: Medium | Original