7.4: Confidence Intervals
Learning Objectives
- Calculate the point estimate of a sample to estimate a population proportion.
- Construct a confidence interval for a population proportion based on a sample population.
- Calculate the margin of error for proportions as a function of sample proportion and size.
- Understand the logic of confidence intervals as well as the meaning of confidence level and confidence intervals.
Introduction
The objective of inferential statistics is to use sample data to increase knowledge about the corresponding entire population. Sampling distributions are the connecting link between the collection of data by unbiased random sampling and the process of drawing conclusions from the collected data. Results obtained from a survey can be reported as a point estimate. For example, a single sample mean is called a point estimate because this single number is used as a plausible value of the population mean. Some error is associated with this estimate - the true population mean may be larger or smaller than the sample mean. An alternative to reporting a point estimate is identifying a range of possible values \begin{align*}p\end{align*} might take, controlling the probability that \begin{align*}\mu\end{align*} is not lower than the lowest value in this range and not higher than the largest value. This range of possible values is known as a confidence interval. Associated with each confidence interval is a confidence level. This level indicates the level of assurance you have that the resulting confidence interval encloses the unknown population mean.
Normal distribution specifies that \begin{align*}68\;\mathrm{percent}\end{align*} of data will fall within one standard error of the parameter. This logic can be turned around to state that any single random sample has a \begin{align*}68\;\mathrm{percent}\end{align*} chance of falling within that range. Likewise, we may say that we are confident that in \begin{align*}95\;\mathrm{percent}\end{align*} of samples, sample statistics are within plus or minus two standard errors of the population parameter. As the confidence interval is expanded for a given statistic, the confidence level increases.
The selection of a confidence level for an interval determines the probability that the confidence interval produced will contain the true parameter value. Common choices for the confidence level are \begin{align*}90, 95\end{align*} and \begin{align*}99\%\end{align*}. These levels correspond to percentages of the area of the normal density curve. For example, a \begin{align*}95\%\end{align*} confidence interval covers \begin{align*}95\%\end{align*} of the normal curve -- the probability of observing a value outside of this area is less than \begin{align*}5\%\end{align*}. Because the normal curve is symmetric, half of the area is in the left tail of the curve, and the other half of the area is in the right tail of the curve. This means that \begin{align*}2.5\%\end{align*} of the area is in each tail.
This graph was made using the TI-83 and shows a normal distribution curve for a set of data that has a mean of \begin{align*}(\mu = 50)\end{align*} and a standard deviation of \begin{align*}(\sigma = 12)\end{align*}. A \begin{align*}95\%\end{align*} confidence interval for the standard normal distribution, then, is the interval \begin{align*}(-1.96, 1.96)\end{align*}, since \begin{align*}95\%\end{align*} of the area under the curve falls within this interval. The \begin{align*}\pm 1.96\end{align*} are the \begin{align*}z-\end{align*}scores that enclose the given area under the curve. For a normal distribution, the margin of error is the proportion that is added and subtracted from the mean to construct the confidence interval. For a \begin{align*}95\%\end{align*} confidence interval, the margin of error equals \begin{align*}\pm 1.96 \ \sigma\end{align*}
The following example will help you to understand these terms and their meaning.
Example:
Jenny randomly selected \begin{align*}60\end{align*} muffins from one company line and had those muffins analyzed for the number of grams of fat that they each contained. Rather than reporting the sample mean (point estimate), she reported the confidence interval (interval estimator). Jenny reported that the number of grams of fat in each muffin is between \begin{align*}10.3 \;\mathrm{grams}\end{align*} and \begin{align*}11.2 \;\mathrm{grams}\end{align*} with \begin{align*}95\%\end{align*} confidence.
The population mean refers to the unknown population mean. This number is fixed, not variable, and the sample means are variable because the samples are random. If this is the case, does the confidence interval enclose this unknown true mean? Random samples lead to the formation of confidence intervals, some of which contain the fixed population mean and some of which do not. The most common mistake made by persons interpreting a confidence interval is claiming that once the interval has been constructed there is a \begin{align*}95\%\end{align*} probability that the population mean is found within the confidence interval. Even though the population mean is known, once the confidence interval is constructed, either the mean is within the confidence interval or it is not. Hence, any probability statement about this particular confidence interval is inappropriate. In the above example, the confidence interval is from \begin{align*}10.3\end{align*} to \begin{align*}12.1\end{align*} and Jenny is using a \begin{align*}95\%\end{align*} confidence level. The appropriate statement should refer to the method used to produce the confidence interval. Jenny should have stated that the method that produced the interval from \begin{align*}10.3\end{align*} to \begin{align*}12.1\end{align*} has a \begin{align*}0.95\end{align*} probability of enclosing the population mean. This does not mean that there is a \begin{align*}0.95\end{align*} probability that the population mean falls in the interval from \begin{align*}10.3\end{align*} to \begin{align*}12.1\end{align*}. The probability is attributed to the method, not to any particular confidence interval. The following diagram demonstrates how the confidence interval provides a range of plausible values for the population mean and that this interval may capture the true population mean. If you formed \begin{align*}100\end{align*} intervals in this manner, \begin{align*}95\%\end{align*} of them would contain the population mean.
Example:
The following questions are to be answered with reference to the above diagram.
a) Were all four sample means within \begin{align*}1.96 \frac{\sigma} {\sqrt{n}}\end{align*}, or \begin{align*}1.96 \sigma_{\bar{x}}\end{align*}, of the population mean? Explain.
b) Did all four confidence intervals capture the population mean? Explain.
c) In general, what percentage of \begin{align*}\bar {x}'\mathrm{s}\end{align*} should be within \begin{align*}1.96 \frac{\sigma} {\sqrt{n}}\end{align*} of the population mean?
d) In general, what percentage of the confidence intervals should contain the population mean?
Solution:
a) The sample mean \begin{align*}(\bar {x})\end{align*} for Sample 3 is not within \begin{align*}1.96 \frac{\sigma} {\sqrt{n}}\end{align*} of the population mean. It does not fall within the two vertical lines on the left and right of the sampling distribution of the sample mean.
b) The confidence interval for Sample 3 does not enclose the population mean. This interval is just to the left of the population mean \begin{align*}(\mu)\end{align*}, which is labeled as the vertical line found in the middle of the sampling distribution of the sample mean.
c) \begin{align*}95\%\end{align*}
d) \begin{align*}95\%\end{align*}
When the sample size is large \begin{align*}(n \ge 30)\end{align*}, the confidence interval for the population mean is calculated as shown below:
\begin{align*}\bar{x} \pm z \frac{\sigma} {\sqrt{n}}\end{align*} where \begin{align*}z\end{align*} is \begin{align*}1.96\end{align*} for a \begin{align*}95\%\end{align*} confidence interval; \begin{align*}1.645\end{align*} for a \begin{align*}90\%\end{align*} confidence interval and \begin{align*}2.56\end{align*} for a \begin{align*}99\%\end{align*} confidence interval.
Example:
Julianne collects four samples of size \begin{align*}60\end{align*} from a known population with a population standard deviation of \begin{align*}19\end{align*} and a population mean of \begin{align*}110\end{align*}. Using the four samples, she calculates the four sample means to be:
\begin{align*}& 107 & & 112 & & 109 & & 115\end{align*}
a) For each sample, determine the \begin{align*}90\%\end{align*} confidence interval?
b) Do all four confidence intervals enclose the population mean? Explain.
Solution:
a)
\begin{align*}& \bar{x} \pm z \frac{\sigma} {\sqrt{n}} & & \bar{x} \pm z \frac{\sigma} {\sqrt{n}} & & \bar{x} \pm z \frac{\sigma} {\sqrt{n}}\\ & 107 \pm 1.645 \frac{19} {\sqrt{60}} & & 112 \pm 1.645 \frac{19} {\sqrt{60}} & & 109 \pm 1.645 \frac{19} {\sqrt{60}}\\ & 107 \pm 4.04 & & 112 \pm 4.04 & & 109 \pm 4.04\\ & \text{from}\ 102.96\ \text{to}\ 111.04 & & \text{from}\ 107.96\ \text{to} \ 116.04 & & \text{from}\ 104.96\ \text{to} \ 113.04\end{align*}
\begin{align*}& \bar{x} \pm z \frac{\sigma} {\sqrt{n}}\\ & 115 \pm 1.645 \frac{19} {\sqrt{60}}\\ & 115 \pm 4.04\\ & \text{from}\ 110.96 \ \text{to} \ 119.04\end{align*}
b) Three of the confidence intervals enclose the population mean. The interval from \begin{align*}110.96\end{align*} to \begin{align*}119.04\end{align*} do not enclose the population mean.
Example:
Now it is time to use the graphing calculator to simulate the collection of three samples of different sizes \begin{align*}– 30, 60, 90\end{align*} respectively. The three sample means will be calculated as well as the three \begin{align*}95\%\end{align*} confidence intervals. The samples will be collected from a population that displays a normal distribution with a population standard deviation of \begin{align*}108\end{align*} and a population mean of \begin{align*}2130\end{align*}.
randInt\begin{align*}(\mu, \sigma, n)\end{align*} store in \begin{align*}L_1\end{align*} Sample size \begin{align*}= 30\end{align*}
randInt\begin{align*}(\mu, \sigma, n)\end{align*} store in \begin{align*}L_2\end{align*} Sample size \begin{align*}= 60\end{align*}
randInt\begin{align*}(\mu, \sigma, n)\end{align*} store in \begin{align*}L_3\end{align*} Sample size \begin{align*}= 90\end{align*}
The lists of numbers can be viewed by [Stat] enter. The next step is to calculate the mean of each of these samples.
[List] \begin{align*}\rightarrow\end{align*} [Math] \begin{align*}\rightarrow\end{align*} mean\begin{align*}(L_1)\end{align*} \begin{align*}1309.6\end{align*} Repeat this for \begin{align*}(L_2) 1171.1\end{align*} and \begin{align*}(L_3) 1077.1\end{align*}.
The three confidence intervals are:
\begin{align*}& \bar{x} \pm z \frac{\sigma} {\sqrt{n}} & & \bar{x} \pm z \frac{\sigma} {\sqrt{n}} & & \bar{x} \pm z \frac{\sigma} {\sqrt{n}}\\ & 1309.6 \pm 1.96 \frac{108} {\sqrt{30}} & & 1171.1 \pm 1.96 \frac{108} {\sqrt{60}} & & 1077.1 \pm 1.96 \frac{108} {\sqrt{90}}\\ & 1309.6 \pm 38.65 & & 1171.1 \pm 27.33 & & 1077.1 \pm 22.31\\ & \text{from}\ 1270.95 \ \text{to} \ 1348.25 & & \text{from}\ 1143.77 \ \text{to} \ 1198.43 & & \text{from}\ 1054.79 \ \text{to} \ 1099.41\end{align*}
As was expected, the value of \begin{align*}\bar {x}\end{align*} varied from one sample to the next. The other fact that was evident was that as the sample size increased, the length of the confidence interval became smaller or decreased.
In all of the examples shown above, you calculated the confidence intervals for the population mean using the formula \begin{align*}\bar {x} \pm z \frac{\sigma} {\sqrt{n}}\end{align*}. However, to use this formula, the population standard deviation \begin{align*}(\sigma)\end{align*} had to be known in order to calculate the interval. If this value is unknown and if the sample size is large \begin{align*}(n \ge 30)\end{align*}, the population standard deviation can be replaced with the sample standard deviation. Thus, the formula \begin{align*}\bar {x} \pm z \frac{S_x} {\sqrt{n}}\end{align*} can be used as an interval estimator. An interval estimator of the population mean is called a confidence interval. This formula is valid only for simple random samples. Since \begin{align*}z \frac{S_x} {\sqrt{n}}\end{align*} is actually the margin of error, a confidence interval can be thought of simply as: \begin{align*}\bar{x} \pm\end{align*} the margin of error.
Example:
A committee set up to field - test questions from a provincial exam, randomly selected Grade \begin{align*}12\end{align*} students to answer the test questions. The answers were graded and the sample mean and sample standard deviation were calculated. Based on the results, the committee predicted that on the same exam, Grade \begin{align*}12\end{align*} students would score an average grade of \begin{align*}65\%\end{align*} with accuracy within \begin{align*}3\%\end{align*}, \begin{align*}9 \;\mathrm{times}\end{align*} out of \begin{align*}10\end{align*}.
a) Are you dealing with a \begin{align*}90\%, 95\%\end{align*} or \begin{align*}99\%\end{align*} confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
Solution:
a) You are dealing with a \begin{align*}90\%\end{align*} confidence level. This is indicated by \begin{align*}9 \;\mathrm{times}\end{align*} out of \begin{align*}10\end{align*}.
b) The margin of error is \begin{align*}3\%\end{align*}.
c) The confidence interval is \begin{align*}\bar {x} \pm\end{align*} the margin of error which is \begin{align*}62\%\end{align*} to \begin{align*}68\%\end{align*}.
d) There is a \begin{align*}0.90\end{align*} probability that the method used to produce this interval from \begin{align*}62\%\end{align*} to \begin{align*}68\%\end{align*} results in a confidence interval that encloses the population mean (the true score for this provincial exam)
The calculation of a confidence interval for a population proportion is similar to that explained above for a sample mean. For a confidence interval of \begin{align*}95\%\end{align*}, the sampling distribution of the sample proportions is approximately normal with large sample sizes \begin{align*}(n \ge 30)\end{align*}. From this statement you can say that \begin{align*}95\%\end{align*} of the sample proportions from a population are within two standard deviations (more accurately \begin{align*}1.96\end{align*} standard deviations) of the population proportion. This is shown in the diagram below:
Therefore, if a single sample proportion is within \begin{align*}1.96 \sqrt{\frac{p(1 - p)} {n}}\end{align*} of the population proportion, then the interval \begin{align*}\hat {p} - 1.96 \sqrt{\frac{p (1 - p)} {n}}\end{align*} to \begin{align*}\hat {p} + 1.96 \sqrt{\frac{p(1 - p)} {n}}\end{align*} will capture the population proportion. This will happen for \begin{align*}95\%\end{align*} of all possible samples. If you look at the above formulas, you should notice that the population proportion \begin{align*}(p)\end{align*} and the sample proportion \begin{align*}(\hat {p})\end{align*} are both used to calculate the confidence interval. However, in real-life situations, the population proportion is seldom known. Therefore,\begin{align*} (p)\end{align*} is most often replaced with \begin{align*}(\hat {p})\end{align*} in the formulas above so that they now become:
\begin{align*}\hat {p} - 1.96 \sqrt{\frac{\hat {p} (1 - \hat {p})} {n}}\end{align*} and \begin{align*}\hat {p} + 1.96 \sqrt{\frac{\hat {p} (1 - \hat {p})} {n}}\end{align*} or in a more standard form \begin{align*}p \pm z \sqrt{\frac{\hat {p}(1 - \hat {p})} {n}}\end{align*} There are two restrictions that apply to this formula: 1) \begin{align*}np \ge 5\end{align*} and 2) \begin{align*}n (1 - p) \ge 5\end{align*}.
As before, the margin of error is \begin{align*}z \sqrt{\frac{\hat {p}(1 - \hat {p})} {n}}\end{align*} and the confidence interval is \begin{align*}\hat {p} \pm\end{align*} the margin of error.
Example:
A large grocery store has been recording data regarding the number of shoppers that use savings coupons at their outlet. Last year it was reported that \begin{align*}77\%\end{align*} of all shoppers used coupons, and these results were considered accurate within \begin{align*}2.9\%\end{align*}, \begin{align*}19 \;\mathrm{times}\end{align*} out of \begin{align*}20\end{align*}.
a) Are you dealing with a \begin{align*}90\%, 95\%\end{align*} or \begin{align*}99\%\end{align*} confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
Solution:
a) The statement \begin{align*}19 \;\mathrm{times}\end{align*} out of \begin{align*}20\end{align*} indicates that you are dealing with a \begin{align*}95\%\end{align*} confidence interval.
b) The results were accurate within \begin{align*}2.9\%\end{align*}, so the margin of error is \begin{align*}2.9\%\end{align*}.
c) The confidence interval is simply \begin{align*}\hat {p} \pm\end{align*} the margin of error.
\begin{align*}& 77\% - 2.9\% = 74.1\% && 77\% + 2.9\% = 79.9\%\end{align*}
The confidence interval is from \begin{align*}74.1\%\end{align*} to \begin{align*}79.9\%\end{align*}.
d) The \begin{align*}95\%\end{align*} confidence interval from \begin{align*}74.1\%\end{align*} to \begin{align*}79.9\%\end{align*} for the population proportion is an interval calculated from a sample by a method that has a \begin{align*}0.95\end{align*} probability of capturing the population proportion.
Lesson Summary
In this lesson you learned that a sample mean is known as a point estimate because this single number is used as a plausible value of the population mean. In addition to reporting a point estimate, you discovered how to calculate an interval of reasonable values based on the sample data. This interval estimator of the population mean is called the confidence interval. You can calculate this interval for the population mean by using the formula \begin{align*}\bar {x} \pm z \frac{\sigma} {\sqrt{n}}\end{align*}. The values of \begin{align*}z\end{align*} are different for each confidence interval of \begin{align*}90\%, 95\%\end{align*} and \begin{align*}99\%\end{align*}. You also learned that the probability is attributed to the method used to calculate the confidence interval.
Points to Consider
- Does replacing \begin{align*}\sigma\end{align*} with \begin{align*}s\end{align*} change your chance of capturing the unknown population mean?
- Is there a way to increase the chance of capturing the unknown population mean?
Review Questions
- In a local teaching district a technology grant is available to teachers in order to install a cluster of four computers in their classrooms. From the \begin{align*}6250\end{align*} teachers in the district, \begin{align*}250\end{align*} were randomly selected and asked if they felt that computers were an essential teaching tool for their classroom. Of those selected, \begin{align*}142\end{align*} teachers felt that computers were an essential teaching tool.
- Calculate a \begin{align*}99\%\end{align*} confidence interval for the proportion of teachers who felt that computers are an essential teaching tool.
- How could the survey be changed to narrow the confidence interval but to maintain the \begin{align*}99\%\end{align*} confidence interval?
- Josie followed the guidelines and conducted a binomial experiment. She did \begin{align*}300\end{align*} trials and reported a sample proportion of \begin{align*}0.61\end{align*}.
- Calculate the \begin{align*}90\%, 95\%\end{align*} and \begin{align*}99\%\end{align*} confidence intervals for this sample.
- What did you notice about the confidence intervals as the confidence level increased? Offer an explanation for your findings?
- If the population proportion were \begin{align*}0.58\end{align*}, would all three confidence intervals enclose it? Explain.
Review Answers
- \begin{align*}\hat {p} & = \frac{x} {n} && \hat {p} \pm z \sqrt{\frac{\hat {p}(1 - \hat{p})} {n}}\\ \hat {p} & = \frac{142} {250} && 0.568 \pm 2.56 \sqrt{\frac{0.568(1 - 0.568)} {250}} \\ \hat {p} & = 0.568 && 0.568 \pm 0.080\end{align*} The interval is from \begin{align*}0.488\end{align*} to \begin{align*}0.648\end{align*} OR from \begin{align*}48.8\%\end{align*} to \begin{align*}64.8\%\end{align*}
- The \begin{align*}99\%\end{align*} confidence interval could be narrowed by increasing the sample size from \begin{align*}250\end{align*} to a larger number.
- \begin{align*}& \hat {p} \pm z \sqrt{\frac{\hat {p}(1 - \hat{p})} {n}} && \hat {p} \pm z \sqrt{\frac{\hat {p}(1 - \hat{p})} {n}}\\ & 0.61 \pm 1.645 \sqrt{\frac{0.61(1 - 0.61)} {300}} && 0.61 \pm 2.56 \sqrt{\frac{0.61(1 - 0.61)} {300}}\\ & \text{from}\ 0.564 \ \text{to}\ 0.656 && \text{from}\ 0.555 \ \text{to}\ 0.665\\ & \hat {p} \pm z \sqrt{\frac{\hat {p}(1 - \hat{p})} {n}}\\ & 0.61 \pm 1.96 \sqrt{\frac{0.61(1 - 0.61)} {300}}\\ & \text{from}\ 0.538 \ \text{to}\ 0.682\end{align*}
- The confidence interval got wider as the confidence level increased. To increase the probability of enclosing the population proportion a wider confidence interval must be chosen.
- Yes, all three confidence intervals would capture the population proportion if it were \begin{align*}0.58\end{align*}.
Vocabulary
- Binomial Experiment
- A type of survey or experiment in which there is a fixed number of trials that have one of only two outcomes. The probability of success for any trial is equal to the population proportion and remains the same for every trial. The outcomes for each trial are independent of one another and the binomial random variable, \begin{align*}x\end{align*}, is the number of successes observed in \begin{align*}n\end{align*} trials.
- Confidence Interval
- An interval of plausible values for a population parameter. Any of the values in the interval could be used to define a population for which the defined sample statistic would be a likely outcome.
- Confidence Level
- The probability that the method used to calculate the confidence interval will produce an interval that will enclose the population parameter.
- Interval Estimator
- Another name for a confidence interval
- Point Estimate
- A single number, such as a single sample mean, that is used as a plausible value of the population mean. This can also be another single value that represents a population parameter.
- Population Proportion
- A fraction of the population that possesses a certain characteristic or probability of an event occurring. The characteristic or event is called a success.
- Sample Proportion
- The ratio of successes \begin{align*}x\end{align*} to sample size \begin{align*}n\end{align*}.