Skip Navigation

7.1: Sampling Distribution

Difficulty Level: At Grade Created by: CK-12
Turn In

Learning Objectives

  • Understand the inferential relationship between a sampling distribution and a population parameter.
  • Graph a frequency distribution of a mean using a data set.
  • Understand the relationship between a sample size and the distribution of the sample means.
  • Understand the sampling error.


Have you ever wondered how the mean or average amount of money in a population is determined? It would be impossible to contact \begin{align*}100\%\end{align*} of the population so there must be a a statistical way to estimate the mean number of dollars of the population.

Suppose, more simply, that we are interested in the mean number of dollars that are in the pockets of ten people on a busy street corner. The diagram below reveals the amount of money that each person in a group of ten has in his/her pocket. We will investigate this scenario later in the lesson.

Sampling Distribution

In previous chapters, you have examined methods that are good for exploration and description of data. In this section we will discuss how collecting data by random sample helps us to draw more rigorous conclusions about the data.

The ultimate purpose of sampling is to select a set of units or elements from a population that represents the parameters of the total population from which the elements were selected. Random sampling is one special type of what is called probability sampling. The reasons for using random sampling are that it erases the danger of a researcher, whether conscious or unconscious, to be biased when selecting cases. In addition, the choice of random selection allows us to use tools from probability theory that provide the bases for estimating the characteristics of the population as well as estimates the accuracy of samples.

Probability theory is the branch of mathematics that provides the tools researchers need to make statistical conclusions about sets of data based on samples. Probability theory also helps statisticians estimate the parameters of a population. A parameter is the summary description of a given variable in a population. Some examples of parameters of a population are the distribution of ages within that population, or the distribution of income levels. When researchers generalize from a sample, they’re using sample observations to estimate population parameters. Probability theory enables them to both make these estimates and to judge how likely the estimates will accurately represent the actual parameters in the population.

Probability theory accomplishes this by way of the concept of sampling distributions. A single sample selected from a population will give an estimate of the population parameter. Other samples would give the same or slightly different estimates. Probability theory helps us understand how to make estimates of the actual population parameters based on such samples.

It is now time to examine an example of sampling distribution to see how this all works. In the scenario that was presented in the introduction to this lesson, the assumption was made that in a case of size ten, one person had no money, another had \begin{align*}\$1.00\end{align*}, another had \begin{align*}\$2.00\end{align*} etc. until we reach the person that had \begin{align*}\$9.00\end{align*}.

The purpose of the task is to determine the average amount of money in this population. If you total the money of the ten people, you will find that the sum is \begin{align*}\$45.00\end{align*}, thus yielding a mean of \begin{align*}\$4.50\end{align*}. To complete the task of determining the mean number of dollars of this population, it is necessary to select random samples from the population and to use the means of these samples to estimate the mean of the whole population. To start, suppose you were to randomly select a sample of only one person from the ten. The ten possible samples are represented in the diagram that shows the dollar bills possessed by each sample. Since samples of one are being taken, they also represent the “means” you would get as estimates of the population. The graph below shows the results:

The distribution of the dots on the graph is called the sampling distribution. As can be concluded, selecting a sample of one is not very good since the group’s mean can be estimated to be anywhere from \begin{align*}\$0.00\end{align*} to \begin{align*}\$9.00\end{align*} and the true mean of \begin{align*}\$4.50\end{align*} could be missed by quite a bit.

What happens if we take samples of two? In other words, from a population of \begin{align*}10\end{align*}, in how many ways can two be selected if the order of the two does not matter? The sample size is now \begin{align*}2\end{align*} and these are being randomly selected from our population. This is referred to in mathematics as a combination and can be readily obtained by using the graphing calculator.

Increasing the sample size has improved your estimations. There are now \begin{align*}45\end{align*} possible samples: and some of \begin{align*}[\$0, \$1], [\$0, \$2],… [\$7, \$8], [\$8, \$9]\end{align*}. Some of these samples produce the same means. For example \begin{align*}[\$0, \$6], [\$1, \$5]\end{align*} and \begin{align*}[\$2, \$4]\end{align*} all produce means of \begin{align*}\$3.00\end{align*}. The three dots above the \begin{align*}\$3.00\end{align*} mean represent these three samples. In addition, the \begin{align*}45\end{align*} means are not evenly distributed, as they were when the sample size was one. Instead they are more clustered around the true mean of \begin{align*}\$4.50\end{align*}. \begin{align*}\left [\$0, \$1\right \}\end{align*} and \begin{align*}[\$8, \$9]\end{align*} are the only two that deviate by as much as \begin{align*}\$4.00\end{align*}. Five of the samples yield the true estimate of \begin{align*}\$4.50\end{align*} and another eight deviate by only \begin{align*}50 \;\mathrm{cents}\end{align*} (plus or minus).

If three are randomly selected from the population of \begin{align*}10\end{align*}, there are \begin{align*}120\end{align*} samples.

Here is a screen shot from the graphing calculator for the results of randomly selecting \begin{align*}1, 2\end{align*} and \begin{align*}3\end{align*} from the population of \begin{align*}10\end{align*}. The \begin{align*}10, 45\end{align*} and \begin{align*}120\end{align*} represent the total number of possible samples that are generated from increasing the sample size by \begin{align*}1\end{align*}.

From the above graphs, it is obvious that increasing the sample size chosen from each sample of size \begin{align*}10\end{align*} resulted in a distinct improvement in the distribution of estimates of the mean. If a sample size of \begin{align*}10\end{align*} were selected, there would be only one possible sample, and it would yield the true mean of \begin{align*}\$4.50\end{align*}. The sampling distribution of the sample means is approximately normal because it has the bell shape of the normal curve.

Now that you have been introduced to sampling distribution and how the sample size affects the distribution of the sampling mean, it is time to investigate a more realistic sampling situation. Assume you want to study the student population of a university to determine approval or disapproval of a student dress code proposed by the administration. The study population will be the \begin{align*}18,000\end{align*} students that attend the school. The elements will be the individual students. A random sample of \begin{align*}100\end{align*} students will be selected for the purpose of estimating the entire student body. Attitudes toward the dress code will be the variable under consideration. For simplicity sake, assume that the attitude variable has two attributes: approve and disapprove. As you know from the last chapter, in a scenario such as this when a variable has two attributes it is called binomial.

The following figure shows the range of possible sample study results. The horizontal axis presents all possible values of the parameter in question. It represents the range from \begin{align*}0 \;\mathrm{percent}\end{align*} to \begin{align*}100 \;\mathrm{percent}\end{align*} of students approving of the dress code. The number \begin{align*}50\end{align*} on the axis represents the midpoint, \begin{align*}50 \;\mathrm{percent}\end{align*}, of the students approving the dress code and \begin{align*}50 \;\mathrm{percent}\end{align*} disapproving. Since the sample size is \begin{align*}100\end{align*}, half of the students are approving and the other half are disapproving.

To randomly select the sample of \begin{align*}100\end{align*}, every student is presented with a number (from \begin{align*}1\end{align*} to \begin{align*}18,000\end{align*}) and the sample is randomly selected from a drum containing all of the numbers.

Each member of the sample is then asked whether they approve or disapprove of the dress code. If this procedure gives \begin{align*}48\end{align*} students who approve of the code and \begin{align*}52\end{align*} who disapprove, the result is recorded on the horizontal axis by placing a dot at \begin{align*}48\%\end{align*}. This percentage describes the variable and is called a statistic.

Let’s assume that the process was repeated again and this resulted in \begin{align*}52\end{align*} students approving the dress code. A third sample resulted in \begin{align*}51\end{align*} students approving the dress code.

In the figure above, the three different sample statistics representing the percentages of students who approved the dress code are shown. The three random samples chosen from the population, give estimates of the parameter that exists in the total population. In particular, each of the random samples gives an estimate of the percentage of students in the total student body of \begin{align*}18,000\end{align*} that approve of the dress code. Assume for simplicity that the true mean for the entire population is \begin{align*}50\%\end{align*}. Then this estimate is close to the true mean. To precisely compute the true mean, it would be necessary to continue choosing samples of \begin{align*}100\end{align*} students and to record all of the results in a summary graph.

By increasing the number of samples of \begin{align*}100\end{align*}, the range of estimates provided by the sampling process has increased. It looks as if the problem in attempting to guess the parameter in the population has also become more complicated. However, probability theory provides an explanation of these results.

First, the sample statistics resulting from the samples are distributed around the population parameter. Although there is a wide range of estimates, more of them lie close to the \begin{align*}50\%\end{align*} area of the graph. Therefore, the true value is likely to be in the vicinity of \begin{align*}50\%\end{align*}. In addition, probability theory gives a formula for estimating how closely the sample statistics are clustered around the true value. In other words, it is possible to estimate the sampling error – the degree of error expected for a given sample design. The formula \begin{align*} s = \sqrt {\frac {P \cdot Q}{n}}\end{align*} contains three factors: the parameters (\begin{align*}P\end{align*} and \begin{align*}Q\end{align*}), the sample size \begin{align*}(n)\end{align*}, and the standard error \begin{align*}(s)\end{align*}

The symbols \begin{align*}P\end{align*} and \begin{align*}Q\end{align*} in the formula equal the population parameters for the binomial: If \begin{align*}60 \;\mathrm{percent}\end{align*} of the student body approves of the dress code and \begin{align*}40\%\end{align*} disapprove, \begin{align*}P\end{align*} and \begin{align*}Q\end{align*} are \begin{align*}60\%\end{align*} and \begin{align*}40\%\end{align*} respectively, or \begin{align*}0.6\end{align*} and \begin{align*}0.4\end{align*}. Note that \begin{align*}Q = 1 - P\end{align*} and \begin{align*}P = 1 - Q\end{align*}. The square root of the product of \begin{align*}P\end{align*} and \begin{align*}Q\end{align*} is actually the population standard deviation. The symbol \begin{align*}n\end{align*} equals the number of cases in each sample, and \begin{align*}s\end{align*} is the standard error.

If the assumption is made that the true population parameter is \begin{align*}50 \;\mathrm{percent}\end{align*} approving the dress code and \begin{align*}50 \;\mathrm{percent}\end{align*} disapproving the dress code while selecting samples of \begin{align*}100\end{align*}, the standard error obtained from the formula equals \begin{align*}5 \;\mathrm{percent}\end{align*} or \begin{align*}.05\end{align*}.

\begin{align*}Q & = 1 - P & & P = 1 - Q\\ Q & = 1 - 0.50 & & P = 1 - 0.50\\ Q & = 0.50 & & P = 0.50\end{align*}

\begin{align*}s = \sqrt{\frac{P \cdot Q}{n}} & & s = \sqrt{\frac{(0.50).(0.50)}{100}} = 0.05 \ \ \text{or} \ \ 5\%\end{align*}

\begin{align*}\sigma & = \sqrt {P \cdot Q} \\ \sigma & = \sqrt {(0.50).(0.50)}\\ \sigma & = 0.050 \ \text{or}\ 50\% \longrightarrow & \text{This is the assumption that was made }\\ && \text{as being the true population parameter}.\end{align*}

This indicates how tightly the sample estimates are distributed around the population parameter. In this case, the standard error is the standard deviation of the sampling distribution.

Probability theory indicates that certain proportions of the sample estimates will fall within defined increments- each equal to one standard error-from the population parameter. Approximately \begin{align*}34 \;\mathrm{percent}\end{align*} of the sample estimates will fall within one standard error increment above the population parameter and another \begin{align*}34 \;\mathrm{percent}\end{align*} will fall within one standard error increment below the population parameter. In the above example, you have calculated the standard error increment to be \begin{align*}5 \;\mathrm{percent}\end{align*}, so you know that \begin{align*}34\%\end{align*} of the samples will yield estimates of student approval between \begin{align*}50\%\end{align*} (the population parameter) and \begin{align*}55\%\end{align*} (one standard error increment above). Likewise, another \begin{align*}34\%\end{align*} of the samples will give estimates between \begin{align*}50\%\end{align*} and \begin{align*}45\%\end{align*} (one standard error increment below the parameter). Therefore, you know that \begin{align*}68\%\end{align*} of the samples will give estimates within \begin{align*}\pm 5 \;\mathrm{percent}\end{align*} of the parameter. In addition, probability theory says that \begin{align*}95\%\end{align*} of the samples will fall within \begin{align*}\pm\end{align*} two standard errors of the true value and \begin{align*}99.9\%\end{align*} will fall within \begin{align*}\pm\end{align*} three standard errors. With reference to this example, you can say that only one sample out of one thousand would give an estimate below \begin{align*}35 \;\mathrm{percent}\end{align*} or above \begin{align*}65 \;\mathrm{percent}\end{align*} approval.

The size of the standard error is a function of the population parameter and the sample size. By looking at this formula, \begin{align*} s = \sqrt {\frac {P \cdot Q}{n}}\end{align*} it is obvious that the standard error will increase as a function of an increase in the quantity \begin{align*}P\end{align*} times \begin{align*}Q\end{align*}. Referring back to our example, the maximum quantity for \begin{align*}P\end{align*} times \begin{align*}Q\end{align*} occurred when there was an even split in the population. \begin{align*}P =.5\end{align*} so \begin{align*}P\ \times Q\ = .25\end{align*}; If \begin{align*}P = .6\end{align*}, then \begin{align*}P\ \times Q\ = .24\end{align*}; if \begin{align*}P = .8\end{align*}, then \begin{align*}P\ \times Q\ = .16\end{align*}. If \begin{align*}P\end{align*} is either or \begin{align*}1.0\end{align*} (none or all of the student body approve of the dress code) then the standard error will be . This means that there is no variation and every sample will give the same estimate.

The standard error is also a function of the sample size. As the sample size increases, the standard error decreases. This is an inverse function. As the sample size increases, the samples will be clustered closer to the true value. The last point about that formula that is obvious is noted by the square root operation. The standard error will be reduced by one-half if the sample size is quadrupled.

\begin{align*} s & = \sqrt {\frac {P \cdot Q}{n}}\\ s & = \sqrt {\frac {(0.50).(0.50)}{400}} = 0.025 \ \ \ \text{or}\ \ \ 2.5\%\end{align*}

Lesson Summary

In this lesson we have learned about probability sampling which is the key sampling method used in controlled survey research. In the example presented above, the elements were chosen for study from a population on the basis of random selection. The sample size had a direct result on the distribution of estimates of the mean. The larger the sample size the more normal the distribution.

Points to Consider

  • Does the mean of the sampling distribution equal the mean of the population?
  • If the sampling distribution is normally distributed, is the population normally distributed?
  • Are there any restrictions on the size of the sample that is used to estimate the parameters of a population?
  • Are there any other components of sampling error estimates?

Review Questions

The following activity could be done in the classroom with the students working in pairs or small groups. Before doing the activity, students could put their pennies into a jar and save them as a class with the teacher also contributing. In a class of \begin{align*}30\end{align*} students, groups of \begin{align*}5\end{align*} students could work together and the various tasks could be divided among those in the group.

  1. If you had \begin{align*}100 \;\mathrm{pennies}\end{align*} and were asked to record the age of each penny predict the shape of the distribution. (The age of a penny is the current year minus the date on the coin.)
  2. Construct a histogram of the ages of your pennies.
  3. Calculate the mean of the ages of the pennies.
  4. Have each student in the group randomly select a sample size of \begin{align*}5 \;\mathrm{pennies}\end{align*} from the \begin{align*}100 \;\mathrm{coins}\end{align*} and calculate the mean of the five ages on the chosen coins. The mean is then to be recorded on a number line. Have the students repeat this process until all of the coins have been chosen. How does the mean of the samples compare to the mean of the population(\begin{align*}100\end{align*} ages)?
  5. Repeat step \begin{align*}4\end{align*} using a sample size of \begin{align*}10 \;\mathrm{pennies}\end{align*}. (As before, allow the students to work in groups)
  6. What is happening to the shape of the sampling distribution of the sample means?

Review Answers

  1. Many students may guess normal, but in reality the distribution is likely to be skewed toward the older pennies. (Remember that this means there are more newer pennies.)
  2. The histogram will probably show the distribution skewed toward the older ages.
  3. Answers will vary
  4. The mean of the sampling distribution should be the same as the mean of the population.
  5. .
  6. The shape of the sampling distribution becomes approximately normal as the sample size increases.

Note: This activity would work very well done with an entire class. Each student could use \begin{align*}20 \;\mathrm{coins}\end{align*} and the sample means could be an accumulation of sample means from each student.


The unit may be selected in a sample. These units comprise a population.
Normal Distribution
A useful and common probability distribution that has a symmetrical, upside - down U-shape or bell shape.
The summary description of a given variable in a population.
The entire set of the elements in a study.
Random Selection
In sampling, a method of choosing representative elements where each element has an equal chance selection independent of any other event in the selection process.
The set of units selected for study from a population.
Sampling Distribution
The distribution of a sample statistic such as a sample mean that is the result of probability sampling.
Sampling Error
The degree of error to be expected for a given probability sample design. The value of the sampling error will show how closely the sample statistics cluster around the true value of population.
The summary description of a variable in a sample.

Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes
Show More

Image Attributions

Show Hide Details
Files can only be attached to the latest version of section
Please wait...
Please wait...
Image Detail
Sizes: Medium | Original