7.3: Confidence Intervals
Learning Objectives
- Calculate the mean of a sample as a point estimate of the population mean.
- Construct a confidence interval for a population mean based on a sample mean.
- Calculate a sample proportion as a point estimate of the population proportion.
- Construct a confidence interval for a population proportion based on a sample proportion.
- Calculate the margin of error for a point estimate as a function of sample mean or proportion and size.
- Understand the logic of confidence intervals, as well as the meaning of confidence level and confidence intervals.
Introduction
The objective of inferential statistics is to use sample data to increase knowledge about the entire population. In this lesson, we will examine how to use samples to make estimates about the populations from which they came. We will also see how to determine how wide these estimates should be and how confident we should be about them.
Confidence Intervals
Sampling distributions are the connecting link between the collection of data by unbiased random sampling and the process of drawing conclusions from the collected data. Results obtained from a survey can be reported as a point estimate. For example, a single sample mean is a point estimate, because this single number is used as a plausible value of the population mean. Keep in mind that some error is associated with this estimate\begin{align*}-\end{align*}the true population mean may be larger or smaller than the sample mean. An alternative to reporting a point estimate is identifying a range of possible values the parameter might take, controlling the probability that the parameter is not lower than the lowest value in this range and not higher than the largest value. This range of possible values is known as a confidence interval. Associated with each confidence interval is a confidence level. This level indicates the level of assurance you have that the resulting confidence interval encloses the unknown population mean.
In a normal distribution, we know that 95% of the data will fall within two standard deviations of the mean. Another way of stating this is to say that we are confident that in 95% of samples taken, the sample statistics are within plus or minus two standard errors of the population parameter. As the confidence interval for a given statistic increases in length, the confidence level increases.
The selection of a confidence level for an interval determines the probability that the confidence interval produced will contain the true parameter value. Common choices for the confidence level are 90%, 95%, and 99%. These levels correspond to percentages of the area under the normal density curve. For example, a 95% confidence interval covers 95% of the normal curve, so the probability of observing a value outside of this area is less than 5%. Because the normal curve is symmetric, half of the 5% is in the left tail of the curve, and the other half is in the right tail of the curve. This means that 2.5% is in each tail.
The graph shown above was made using a TI-83 graphing calculator and shows a normal distribution curve for a set of data for which \begin{align*}\mu=50\end{align*} and \begin{align*}\sigma=12\end{align*}. A 95% confidence interval for the standard normal distribution, then, is the interval (\begin{align*}-1.96\end{align*}, 1.96), since 95% of the area under the curve falls within this interval. The \begin{align*}\pm 1.96\end{align*} are the \begin{align*}z\end{align*}-scores that enclose the given area under the curve. For a normal distribution, the margin of error is the amount that is added to and subtracted from the mean to construct the confidence interval. For a 95% confidence interval, the margin of error is \begin{align*}1.96\sigma\end{align*}. (Note that previously we said that 95% of the data in a normal distribution falls within \begin{align*}\pm 2\end{align*} standard deviations of the mean. This was just an estimate, and for the remainder of this textbook, we'll assume that 95% of the data actually falls within \begin{align*}\pm 1.96\end{align*} standard deviations of the mean.)
The following is the derivation of the confidence interval for the population mean, \begin{align*}\mu\end{align*}. In it, \begin{align*}z_{\frac{\alpha}{2}}\end{align*} refers to the positive \begin{align*}z\end{align*}-score for a particular confidence interval. The Central Limit Theorem tells us that the distribution of \begin{align*}\bar{x}\end{align*} is normal, with a mean of \begin{align*}\mu\end{align*} and a standard deviation of \begin{align*}\frac{\sigma}{\sqrt{n}}\end{align*}. Consider the following:
\begin{align*}-z_{\frac{\alpha}{2}} < \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} < z_{\frac{\alpha}{2}}\end{align*}
All values are known except for \begin{align*}\mu\end{align*}. Solving for this parameter, we have:
\begin{align*}&-\bar{x} - z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} < -\mu<z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}-\bar{x}\\ &\bar{x} + z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} > \mu > -z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}+\bar{x}\\ &\bar{x} + z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} > \mu > \bar{x} - z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\\ &\bar{x} - z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} < \mu < \bar{x} + z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\end{align*}
Another way to express this is: \begin{align*}\bar{x} \pm z_{\frac{\alpha}{2}}\left ( \frac{\sigma}{\sqrt{n}} \right )\end{align*}.
On the Web
http://tinyurl.com/27syj3x This simulates confidence intervals for the mean of the population.
Example: Jenny randomly selected 60 muffins of a particular brand and had those muffins analyzed for the number of grams of fat that they each contained. Rather than reporting the sample mean (point estimate), she reported the confidence interval. Jenny reported that the number of grams of fat in each muffin is between 10.3 grams and 11.2 grams with 95% confidence.
In this example, the population mean is unknown. This number is fixed, not variable, and the sample means are variable, because the samples are random. If this is the case, does the confidence interval enclose this unknown true mean? Random samples lead to the formation of confidence intervals, some of which contain the fixed population mean and some of which do not. The most common mistake made by persons interpreting a confidence interval is claiming that once the interval has been constructed, there is a 95% probability that the population mean is found within the confidence interval. Even though the population mean is unknown, once the confidence interval is constructed, either the mean is within the confidence interval, or it is not. Hence, any probability statement about this particular confidence interval is inappropriate. In the above example, the confidence interval is from 10.3 to 12.1, and Jenny is using a 95% confidence level. The appropriate statement should refer to the method used to produce the confidence interval. Jenny should have stated that the method that produced the interval from 10.3 to 12.1 has a 0.95 probability of enclosing the population mean. This means if she did this procedure 100 times, 95 of the intervals produced would contain the population mean. The probability is attributed to the method, not to any particular confidence interval. The following diagram demonstrates how the confidence interval provides a range of plausible values for the population mean and that this interval may or may not capture the true population mean. If you formed 100 intervals in this manner, 95 of them would contain the population mean.
Example: The following questions are to be answered with reference to the above diagram.
a) Were all four sample means within \begin{align*}1.96 \frac{\sigma}{\sqrt{n}}\end{align*}, or \begin{align*}1.96\sigma_{\bar{x}}\end{align*}, of the population mean? Explain.
b) Did all four confidence intervals capture the population mean? Explain.
c) In general, what percentage of \begin{align*}\bar{x}'s\end{align*} should be within \begin{align*}1.96 \frac{\sigma}{\sqrt{n}}\end{align*} of the population mean?
d) In general, what percentage of the confidence intervals should contain the population mean?
a) The sample mean, \begin{align*}\bar{x}\end{align*}, for Sample 3 was not within \begin{align*}1.96\frac{\sigma}{\sqrt{n}}\end{align*} of the population mean. It did not fall within the vertical lines to the left and right of the population mean.
b) The confidence interval for Sample 3 did not enclose the population mean. This interval was just to the left of the population mean, which is denoted with the vertical line found in the middle of the sampling distribution of the sample means.
c) 95%
d) 95%
When the sample size is large \begin{align*}(n>30)\end{align*}, the confidence interval for the population mean is calculated as shown below:
\begin{align*}\bar{x}\pm z_{\frac{\alpha}{2}} \left ( \frac{\sigma}{\sqrt{n}} \right )\end{align*}, where \begin{align*}z_{\frac{\alpha}{2}}\end{align*} is 1.96 for a 95% confidence interval, 1.645 for a 90% confidence interval, and 2.56 for a 99% confidence interval.
Example: Julianne collects four samples of size 60 from a known population with a population standard deviation of 19 and a population mean of 110. Using the four samples, she calculates the four sample means to be:
\begin{align*}107 \qquad 112 \qquad 109 \qquad 115\end{align*}
a) For each sample, determine the 90% confidence interval.
b) Do all four confidence intervals enclose the population mean? Explain.
a) \begin{align*}&\bar{x} \pm z\frac{\sigma}{\sqrt{n}} && \bar{x} \pm z\frac{\sigma}{\sqrt{n}} && \bar{x} \pm z\frac{\sigma}{\sqrt{n}}\\ &107 \pm (1.645)(\frac{19}{\sqrt{60}}) && 112 \pm (1.645)(\frac{19}{\sqrt{60}}) && 109 \pm (1.645)(\frac{19}{\sqrt{60}})\\ &107 \pm 4.04 && 112 \pm 4.04 && 109 \pm 4.04\\ &\text{from} \ 102.96 \ \text{to} \ 111.04 && \text{from} \ 107.96 \ \text{to} \ 116.04 && \text{from} \ 104.96 \ \text{to} \ 113.04\end{align*}
\begin{align*}&\bar{x} \pm z\frac{\sigma}{\sqrt{n}}\\ &115 \pm (1.645)(\frac{19}{\sqrt{60}})\\ &115 \pm 4.04\\ &\text{from} \ 110.96 \ \text{to} \ 119.04\end{align*}
b) Three of the confidence intervals enclose the population mean. The interval from 110.96 to 119.04 does not enclose the population mean.
Technology Note: Simulation of Random Samples and Formation of Confidence Intervals on the TI-83/84 Calculator
Now it is time to use a graphing calculator to simulate the collection of three samples of sizes 30, 60, and 90, respectively. The three sample means will be calculated, as well as the three 95% confidence intervals. The samples will be collected from a population that displays a normal distribution, with a population standard deviation of 108 and a population mean of 2130. First, store the three samples in L1, L2, and L3, respectively, as shown below:
Store 'randInt\begin{align*}(\mu,\sigma,n)\end{align*}' in L1. The sample size is \begin{align*}n=30\end{align*}.
Store 'randInt\begin{align*}(\mu,\sigma,n)\end{align*}' in L2. The sample size is \begin{align*}n=60\end{align*}.
Store 'randInt\begin{align*}(\mu,\sigma,n)\end{align*}' in L3. The sample size is \begin{align*}n=90\end{align*}.
The lists of numbers can be viewed by pressing [STAT][ENTER]. The next step is to calculate the mean of each of these samples.
To do this, first press [2ND][LIST] and go to the MATH menu. Next, select the 'mean(' command and press [2ND][L1][ENTER]. Repeat this process for L2 and L3.
Note that your confidence intervals will be different than the ones calculated below, because the random numbers generated by your calculator will be different, and thus, your means will be different. For us, the means of L1, L2, and L3 were 1309.6, 1171.1, and 1077.1, respectively, so the confidence intervals are as follows:
\begin{align*}& \bar{x} \pm z\frac{\sigma}{\sqrt{n}} && \bar{x} \pm z\frac{\sigma}{\sqrt{n}} && \bar{x} \pm z\frac{\sigma}{\sqrt{n}}\\ & 1309.6 \pm (1.96)(\frac{108}{\sqrt{30}}) && 1171.1 \pm (1.96)(\frac{108}{\sqrt{60}}) && 1077.1 \pm (1.96)(\frac{108}{\sqrt{90}})\\ & 1309.6 \pm 38.65 && 1171.1 \pm 27.33 && 1077.1 \pm 22.31\\ & \text{from} \ 1270.95 \ \text{to} \ 1348.25 && \text{from} \ 1143.77 \ \text{to} \ 1198.43 && \text{from} \ 1054.79 \ \text{to} \ 1099.41\end{align*}
As was expected, the value of \begin{align*}\bar{x}\end{align*} varied from one sample to the next. The other fact that was evident was that as the sample size increased, the length of the confidence interval became smaller, or decreased. This is because with the increase in sample size, you have more information, and thus, your estimate is more accurate, which leads to a narrower confidence interval.
In all of the examples shown above, you calculated the confidence intervals for the population mean using the formula \begin{align*}\bar{x} \pm z_{\frac{\alpha}{2}} \left ( \frac{\sigma}{\sqrt{n}} \right )\end{align*}. However, to use this formula, the population standard deviation \begin{align*}\sigma\end{align*} had to be known. If this value is unknown, and if the sample size is large \begin{align*}(n>30)\end{align*}, the population standard deviation can be replaced with the sample standard deviation. Thus, the formula \begin{align*}\bar{x} \pm z_{\frac{\alpha}{2}} \left ( \frac{s_x}{\sqrt{n}} \right )\end{align*} can be used as an interval estimator, or confidence interval. This formula is valid only for simple random samples. Since \begin{align*}z_{\frac{\alpha}{2}} \left ( \frac{s_x}{\sqrt{n}} \right )\end{align*} is the margin of error, a confidence interval can be thought of simply as: \begin{align*}\bar{x} \pm\end{align*} the margin of error.
Example: A committee set up to field-test questions from a provincial exam randomly selected grade 12 students to answer the test questions. The answers were graded, and the sample mean and sample standard deviation were calculated. Based on the results, the committee predicted that on the same exam, 9 times out of 10, grade 12 students would have an average score of within 3% of 65%.
a) Are you dealing with a 90%, 95%, or 99% confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
a) You are dealing with a 90% confidence level. This is indicated by 9 times out of 10.
b) The margin of error is 3%.
c) The confidence interval is \begin{align*}\bar{x} \pm\end{align*} the margin of error, or 62% to 68%.
d) There is a 0.90 probability that the method used to produce this interval from 62% to 68% results in a confidence interval that encloses the population mean (the true score for this provincial exam).
Confidence Intervals for Hypotheses about Population Proportions
In estimating a parameter, we can use a point estimate or an interval estimate. The point estimate for the population proportion, \begin{align*}p\end{align*}, is \begin{align*}\hat{p}\end{align*}. We can also find interval estimates for this parameter. These intervals are based on the sampling distributions of \begin{align*}\hat{p}\end{align*}.
If we are interested in finding an interval estimate for the population proportion, the following two conditions must be satisfied:
- We must have a random sample.
- The sample size must be large enough (\begin{align*}n\hat{p}>10\end{align*} and \begin{align*}n(1-\hat{p})>10\end{align*}) that we can use the normal distribution as an approximation to the binomial distribution.
\begin{align*}\sqrt{\frac{p(1-p)}{n}}\end{align*} is the standard deviation of the distribution of sample proportions. The distribution of sample proportions is as follows:
Since we do not know the value of \begin{align*}p\end{align*}, we must replace it with \begin{align*}\hat{p}\end{align*}. We then have the standard error of the sample proportions, \begin{align*}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\end{align*}. If we are interested in a 95% confidence interval, using the Empirical Rule, we are saying that we want the difference between the sample proportion and the population proportion to be within 1.96 standard deviations.
That is, we want the following:
\begin{align*}&-1.96 \ \text{standard errors} < \hat{p}-p<1.96 \ \text{standard errors}\\ &-\hat{p}-1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < - p < -\hat{p} + 1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\\ &\hat{p} + 1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} > p > \hat{p}-1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\\ &\hat{p} - 1.96 \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < p < \hat{p}+1.96\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\end{align*}
This is a 95% confidence interval for the population proportion. If we generalize for any confidence level, the confidence interval is as follows:
\begin{align*}\hat{p}-z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < p < \hat{p} + z_{\frac{\alpha}{2}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\end{align*}
In other words, the confidence interval is \begin{align*}\hat{p} \pm z_{\frac{\alpha}{2}} \left ( \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \right )\end{align*}. Remember that \begin{align*}z_{\frac{\alpha}{2}}\end{align*} refers to the positive \begin{align*}z\end{align*}-score for a particular confidence interval. Also, \begin{align*}\hat{p}\end{align*} is the sample proportion, and \begin{align*}n\end{align*} is the sample size. As before, the margin of error is \begin{align*}z_{\frac{\alpha}{2}} \left ( \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \right )\end{align*}, and the confidence interval is \begin{align*}\hat{p}\pm\end{align*} the margin of error.
Example: A congressman is trying to decide whether to vote for a bill that would legalize gay marriage. He will decide to vote for the bill only if 70 percent of his constituents favor the bill. In a survey of 300 randomly selected voters, 224 (74.6%) indicated they would favor the bill. The congressman decides that he wants an estimate of the proportion of voters in the population who are likely to favor the bill. Construct a confidence interval for this population proportion.
Our sample proportion is 0.746, and our standard error of the proportion is 0.0251. We will construct a 95% confidence interval for the population proportion. Under the normal curve, 95% of the area is between \begin{align*}z = -1.96\end{align*} and \begin{align*}z=1.96\end{align*}. Thus, the confidence interval for this proportion would be:
\begin{align*}& 0.746 \pm (1.96)(0.0251)\\ & 0.697 < p < 0.795\end{align*}
With respect to the population proportion, we are 95% confident that the interval from 0.697 to 0.795 contains the population proportion. The population proportion is either in this interval, or it is not. When we say that this is a 95% confidence interval, we mean that if we took 100 samples, all of size \begin{align*}n\end{align*}, and constructed 95% confidence intervals for each of these samples, 95 out of the 100 confidence intervals we constructed would capture the population proportion, \begin{align*}p\end{align*}.
Example: A large grocery store has been recording data regarding the number of shoppers that use savings coupons at its outlet. Last year, it was reported that 77% of all shoppers used coupons, and 19 times out of 20, these results were considered to be accurate within 2.9%.
a) Are you dealing with a 90%, 95%, or 99% confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
a) The statement 19 times out of 20 indicates that you are dealing with a 95% confidence interval.
b) The results were accurate within 2.9%, so the margin of error is 0.029.
c) The confidence interval is simply \begin{align*}\hat{p} \pm\end{align*} the margin of error.
\begin{align*}77\%-2.9\%=74.1\% \qquad 77\%+2.9\%=79.9\%\end{align*}
Thus, the confidence interval is from 0.741 to 0.799.
d) The 95% confidence interval from 0.741 to 0.799 for the population proportion is an interval calculated from a sample by a method that has a 0.95 probability of capturing the population proportion.
On the Web
http://tinyurl.com/27syj3x This simulates confidence intervals for the population proportion.
http://tinyurl.com/28z97lr Explore how changing the confidence level and/or the sample size affects the length of the confidence interval.
Lesson Summary
In this lesson, you learned that a sample mean is known as a point estimate, because this single number is used as a plausible value of the population mean. In addition to reporting a point estimate, you discovered how to calculate an interval of reasonable values based on the sample data. This interval estimator of the population mean is called the confidence interval. You can calculate this interval for the population mean by using the formula \begin{align*}\bar{x}\pm z_{\frac{\alpha}{2}} \left ( \frac{\sigma}{\sqrt{n}} \right )\end{align*}. The value of \begin{align*}z_{\frac{\alpha}{2}}\end{align*} is different for each confidence interval of 90%, 95%, and 99%. You also learned that the probability is attributed to the method used to calculate the confidence interval.
In addition, you learned that you calculate the confidence interval for a population proportion by using the formula \begin{align*}\hat{p} \pm z_{\frac{\alpha}{2}} \left ( \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \right )\end{align*}.
Points to Consider
- Does replacing \begin{align*}\sigma\end{align*} with \begin{align*}s\end{align*} change your chance of capturing the unknown population mean?
- Is there a way to increase the chance of capturing the unknown population mean?
Multimedia Links
For an explanation of the concept of confidence intervals (17.0), see kbower50, What are Confidence Intervals? (3:24).
For a description of the formula used to find confidence intervals for the mean (17.0), see mathguyzero, Statistics Confidence Interval Definition and Formula (1:26).
For an interactive demonstration of the relationship between margin of error, sample size, and confidence intervals (17.0), see wolframmathematica, Confidence Intervals: Confidence Level, Sample Size, and Margin of Error (0:16).
For an explanation on finding the sample size for a particular margin of error (17.0), see statslectures, Calculating Required Sample Size to Estimate Population Mean (2:18).
Review Questions
- In a local teaching district, a technology grant is available to teachers in order to install a cluster of four computers in their classrooms. From the 6,250 teachers in the district, 250 were randomly selected and asked if they felt that computers were an essential teaching tool for their classroom. Of those selected, 142 teachers felt that computers were an essential teaching tool.
- Calculate a 99% confidence interval for the proportion of teachers who felt that computers are an essential teaching tool.
- How could the survey be changed to narrow the confidence interval but to maintain the 99% confidence interval?
- Josie followed the guidelines presented to her and conducted a binomial experiment. She did 300 trials and reported a sample proportion of 0.61.
- Calculate the 90%, 95%, and 99% confidence intervals for this sample.
- What did you notice about the confidence intervals as the confidence level increased? Offer an explanation for your findings?
- If the population proportion were 0.58, would all three confidence intervals enclose it? Explain.
Keywords
- Central Limit Theorem
- The distribution of the sample mean will approach a normal distribution when the sample size increases.
- Confidence interval
- Range of possible values the parameter might take.
- Confidence level
- The probability that the method used to calculate the confidence interval will produce an interval that will enclose the population parameter.
- Margin of error
- The amount that is added to and subtracted from the mean to construct the confidence interval.
- Parameter
- Numerical descriptive measure of a population.
- Point estimate
- Sampling distributions are the connecting link between the collection of data by unbiased random sampling and the process of drawing conclusions from the collected data. Results obtained from a survey can be reported as a point estimate.
- Sample means
- the sampling distribution of the sample means is approximately normal, as can be seen by the bell shape in each of the graphs.
- Sample proportion
- If this procedure gives 48 students who approve of the dress code and 52 who disapprove, the result would be recorded on the figure by placing a dot at 48%. This statistic is the sample proportion.
- Sampling distributions
- The sampling distribution is the probability distribution of the statistic.
- Standard error
- The standard error is also a function of the sample size. In other words, as the sample size increases, the standard error decreases, or the bigger the sample size, the more closely the samples will be clustered around the true value.