The objective of inferential statistics is to use sample data to increase knowledge about the entire population. In this Concept, we will examine how to use samples to make estimates about the populations from which they came. We will also see how to determine how wide these estimates should be and how confident we should be about them.
For an explanation of the concept of confidence intervals (17.0), see kbower50, What are Confidence Intervals? (3:24).
For a description of the formula used to find confidence intervals for the mean (17.0), see mathguyzero, Statistics Confidence Interval Definition and Formula (1:26).
For an interactive demonstration of the relationship between margin of error, sample size, and confidence intervals (17.0), see wolframmathematica, Confidence Intervals: Confidence Level, Sample Size, and Margin of Error (0:16).
For an explanation on finding the sample size for a particular margin of error (17.0), see statslectures, Calculating Required Sample Size to Estimate Population Mean (2:18).
Sampling distributions are the connecting link between the collection of data by unbiased random sampling and the process of drawing conclusions from the collected data. Results obtained from a survey can be reported as a point estimate. For example, a single sample mean is a point estimate, because this single number is used as a plausible value of the population mean. Keep in mind that some error is associated with this estimate-the true population mean may be larger or smaller than the sample mean. An alternative to reporting a point estimate is identifying a range of possible values the parameter might take, controlling the probability that the parameter is not lower than the lowest value in this range and not higher than the largest value. This range of possible values is known as a confidence interval. Associated with each confidence interval is a confidence level. This level indicates the level of assurance you have that the resulting confidence interval encloses the unknown population mean.
In a normal distribution, we know that 95% of the data will fall within two standard deviations of the mean. Another way of stating this is to say that we are confident that in 95% of samples taken, the sample statistics are within plus or minus two standard errors of the population parameter. As the confidence interval for a given statistic increases in length, the confidence level increases.
The selection of a confidence level for an interval determines the probability that the confidence interval produced will contain the true parameter value. Common choices for the confidence level are 90%, 95%, and 99%. These levels correspond to percentages of the area under the normal density curve. For example, a 95% confidence interval covers 95% of the normal curve, so the probability of observing a value outside of this area is less than 5%. Because the normal curve is symmetric, half of the 5% is in the left tail of the curve, and the other half is in the right tail of the curve. This means that 2.5% is in each tail.
The graph shown above was made using a TI-83 graphing calculator and shows a normal distribution curve for a set of data for which and A 95% confidence interval for the standard normal distribution, then, is the interval (, 1.96), since 95% of the area under the curve falls within this interval. The are the -scores that enclose the given area under the curve. For a normal distribution, the margin of error is the amount that is added to and subtracted from the mean to construct the confidence interval. For a 95% confidence interval, the margin of error is . (Note that previously we said that 95% of the data in a normal distribution falls within standard deviations of the mean. This was just an estimate, and for the remainder of this textbook, we'll assume that 95% of the data actually falls within standard deviations of the mean.)
The following is the derivation of the confidence interval for the population mean, . In it, refers to the positive -score for a particular confidence interval. The Central Limit Theorem tells us that the distribution of is normal, with a mean of and a standard deviation of . Consider the following:
All values are known except for . Solving for this parameter, we have:
Another way to express this is: .
On the Web
http://tinyurl.com/27syj3x This simulates confidence intervals for the mean of the population.
Jenny randomly selected 60 muffins of a particular brand and had those muffins analyzed for the number of grams of fat that they each contained. Rather than reporting the sample mean (point estimate), she reported the confidence interval. Jenny reported that the number of grams of fat in each muffin is between 10.3 grams and 12.1 grams with 95% confidence.
In this example, the population mean is unknown. This number is fixed, not variable, and the sample means are variable, because the samples are random. If this is the case, does the confidence interval enclose this unknown true mean? Random samples lead to the formation of confidence intervals, some of which contain the fixed population mean and some of which do not. The most common mistake made by persons interpreting a confidence interval is claiming that once the interval has been constructed, there is a 95% probability that the population mean is found within the confidence interval. Even though the population mean is unknown, once the confidence interval is constructed, either the mean is within the confidence interval, or it is not. Hence, any probability statement about this particular confidence interval is inappropriate. In the above example, the confidence interval is from 10.3 to 12.1, and Jenny is using a 95% confidence level. The appropriate statement should refer to the method used to produce the confidence interval. Jenny should have stated that the method that produced the interval from 10.3 to 12.1 has a 0.95 probability of enclosing the population mean. This means if she did this procedure 100 times, 95 of the intervals produced would contain the population mean. The probability is attributed to the method, not to any particular confidence interval. The following diagram demonstrates how the confidence interval provides a range of plausible values for the population mean and that this interval may or may not capture the true population mean. If you formed 100 intervals in this manner, 95 of them would contain the population mean.
The following questions are to be answered with reference to the above diagram.
a) Were all four sample means within , or , of the population mean? Explain.
b) Did all four confidence intervals capture the population mean? Explain.
c) In general, what percentage of should be within of the population mean?
d) In general, what percentage of the confidence intervals should contain the population mean?
a) The sample mean, , for Sample 3 was not within of the population mean. It did not fall within the vertical lines to the left and right of the population mean.
b) The confidence interval for Sample 3 did not enclose the population mean. This interval was just to the left of the population mean, which is denoted with the vertical line found in the middle of the sampling distribution of the sample means.
When the sample size is large , the confidence interval for the population mean is calculated as shown below:
, where is 1.96 for a 95% confidence interval, 1.645 for a 90% confidence interval, and 2.58 for a 99% confidence interval.
Julianne collects four samples of size 60 from a known population with a population standard deviation of 19 and a population mean of 110. Using the four samples, she calculates the four sample means to be:
a) For each sample, determine the 90% confidence interval.
b) Do all four confidence intervals enclose the population mean? Explain.
b) Three of the confidence intervals enclose the population mean. The interval from 110.96 to 119.04 does not enclose the population mean.
Technology Note: Simulation of Random Samples and Formation of Confidence Intervals on the TI-83/84 Calculator
Now it is time to use a graphing calculator to simulate the collection of three samples of sizes 30, 60, and 90, respectively. The three sample means will be calculated, as well as the three 95% confidence intervals. The samples will be collected from a population that displays a normal distribution, with a population standard deviation of 108 and a population mean of 2130. We will use the randNorm( function found in [MATH], under the PRB menu. First, store the three samples in L1, L2, and L3, respectively, as shown below:
Store 'randNorm' in L1. The sample size is .
Store 'randNorm' in L2. The sample size is .
Store 'randNorm' in L3. The sample size is .
The lists of numbers can be viewed by pressing [STAT][ENTER]. The next step is to calculate the mean of each of these samples.
To do this, first press [2ND][LIST] and go to the MATH menu. Next, select the 'mean(' command and press [2ND][L1][ENTER]. Repeat this process for L2 and L3.
Note that your confidence intervals will be different than the ones calculated below, because the random numbers generated by your calculator will be different, and thus, your means will be different. For us, the means of L1, L2, and L3 were 2139.1, 2119.2, and 2137.1, respectively, so the confidence intervals are as follows:
As was expected, the value of varied from one sample to the next. The other fact that was evident was that as the sample size increased, the length of the confidence interval became smaller, or decreased. This is because with the increase in sample size, you have more information, and thus, your estimate is more accurate, which leads to a narrower confidence interval.
In all of the examples shown above, you calculated the confidence intervals for the population mean using the formula . However, to use this formula, the population standard deviation had to be known. If this value is unknown, and if the sample size is large , the population standard deviation can be replaced with the sample standard deviation. Thus, the formula can be used as an interval estimator, or confidence interval. This formula is valid only for simple random samples. Since is the margin of error, a confidence interval can be thought of simply as: the margin of error.
A committee set up to field-test questions from a provincial exam randomly selected grade 12 students to answer the test questions. The answers were graded, and the sample mean and sample standard deviation were calculated. Based on the results, the committee predicted that on the same exam, 9 times out of 10, grade 12 students would have an average score of within 3% of 65%.
a) Are you dealing with a 90%, 95%, or 99% confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
a) You are dealing with a 90% confidence level. This is indicated by 9 times out of 10.
b) The margin of error is 3%.
c) The confidence interval is the margin of error, or 62% to 68%.
d) There is a 0.90 probability that the method used to produce this interval from 62% to 68% results in a confidence interval that encloses the population mean (the true score for this provincial exam).
Confidence Intervals for Hypotheses about Population Proportions
Often statisticians are interested in making inferences about a population proportion. For example, when we look at election results we often look at the proportion of people that vote and who this proportion of voters choose. Typically, we call these proportions percentages and we would say something like “Approximately 68 percent of the population voted in this election and 48 percent of these voters voted for Barack Obama.”
In estimating a parameter, we can use a point estimate or an interval estimate. The point estimate for the population proportion, , is . We can also find interval estimates for this parameter. These intervals are based on the sampling distributions of .
If we are interested in finding an interval estimate for the population proportion, the following two conditions must be satisfied:
- We must have a random sample.
- The sample size must be large enough ( and ) that we can use the normal distribution as an approximation to the binomial distribution.
is the standard deviation of the distribution of sample proportions. The distribution of sample proportions is as follows:
Since we do not know the value of , we must replace it with . We then have the standard error of the sample proportions, . If we are interested in a 95% confidence interval, using the Empirical Rule, we are saying that we want the difference between the sample proportion and the population proportion to be within 1.96 standard deviations.
That is, we want the following:
This is a 95% confidence interval for the population proportion. If we generalize for any confidence level, the confidence interval is as follows:
In other words, the confidence interval is . Remember that refers to the positive -score for a particular confidence interval. Also, is the sample proportion, and is the sample size. As before, the margin of error is , and the confidence interval is the margin of error.
A congressman is trying to decide whether to vote for a bill that would remove all speed limits on interstate highways. He will decide to vote for the bill only if 70 percent of his constituents favor the bill. In a survey of 300 randomly selected voters, 224 (74.6%) indicated they would favor the bill. The congressman decides that he wants an estimate of the proportion of voters in the population who are likely to favor the bill. Construct a confidence interval for this population proportion.
Our sample proportion is 0.746, and our standard error of the proportion is 0.0251. We will construct a 95% confidence interval for the population proportion. Under the normal curve, 95% of the area is between and . Thus, the confidence interval for this proportion would be:
With respect to the population proportion, we are 95% confident that the interval from 0.697 to 0.795 contains the population proportion. The population proportion is either in this interval, or it is not. When we say that this is a 95% confidence interval, we mean that if we took 100 samples, all of size , and constructed 95% confidence intervals for each of these samples, 95 out of the 100 confidence intervals we constructed would capture the population proportion, .
On the Web
http://tinyurl.com/27syj3x This simulates confidence intervals for the population proportion.
http://tinyurl.com/28z97lr Explore how changing the confidence level and/or the sample size affects the length of the confidence interval.
A large grocery store has been recording data regarding the number of shoppers that use savings coupons at its outlet. Last year, it was reported that 77% of all shoppers used coupons, and 19 times out of 20, these results were considered to be accurate within 2.9%.
a) Are you dealing with a 90%, 95%, or 99% confidence level?
b) What is the margin of error?
c) Calculate the confidence interval.
d) Explain the meaning of the confidence interval.
a) The statement 19 times out of 20 indicates that you are dealing with a 95% confidence interval.
b) The results were accurate within 2.9%, so the margin of error is 0.029.
c) The confidence interval is simply the margin of error.
Thus, the confidence interval is from 0.741 to 0.799.
d) The 95% confidence interval from 0.741 to 0.799 for the population proportion is an interval calculated from a sample by a method that has a 0.95 probability of capturing the population proportion.
- In a local teaching district, a technology grant is available to teachers in order to install a cluster of four computers in their classrooms. From the 6,250 teachers in the district, 250 were randomly selected and asked if they felt that computers were an essential teaching tool for their classroom. Of those selected, 142 teachers felt that computers were an essential teaching tool.
- Calculate a 99% confidence interval for the proportion of teachers who felt that computers are an essential teaching tool.
- How could the survey be changed to narrow the confidence interval but to maintain the 99% confidence interval?
- Josie followed the guidelines presented to her and conducted a binomial experiment. She did 300 trials and reported a sample proportion of 0.61.
- Calculate the 90%, 95%, and 99% confidence intervals for this sample.
- What did you notice about the confidence intervals as the confidence level increased? Offer an explanation for your findings?
- If the population proportion were 0.58, would all three confidence intervals enclose it? Explain.
- Does replacing with change your chance of capturing the unknown population mean? Is there a way to increase the chance of capturing the unknown population mean?
- A study was conducted to determine the mean birth weight of a certain breed of kittens. Consider the birth weights of kittens to be normally distributed. A sample of 45 kittens was randomly selected from all kittens of this breed at a large veterinary hospital. The birth weight of each kitten in the sample was recorded. The sample mean was 3.56 ounces, and the sample standard deviation was 0.2 ounces. Set a 90% confidence interval on the mean birth weight of all kittens of this breed.
- In a study of seventh grade students, the mean number of hours per week that they watched television was 18.7 with a standard deviation of 4.5 hours. Assume the population has a normal distribution. Construct a 95% confidence interval for the mean number of hours of tv watched by seventh grade students.
- A random sample of 40 college students has mean annual earnings of $3,245 and a standard deviation of $567. Construct a 99% confidence interval for the population. Does the population have to follow a normal distribution? Explain.
- A random sample of 16 light bulbs has a mean life of 650 hours and a standard deviation of 32 hours. Assume the population has a normal distribution. Construct a 90% confidence interval for the population mean.
- A sample of 100 cans of peas showed an average weight of 14 ounces with a standard deviation of 0.7 ounces. Construct a 95% confidence interval for the mean of the population.
- What three factors affect the width of a confidence interval for a population mean? For each factor, indicate how an increase in the numerical value of the factor affects the interval width.
- For each of the following use the information given to calculate the standard error of the mean and find an approximate 90% confidence interval for the population mean:
- Suppose a random sample of 64 men has a mean foot length of 27.5 cm with a standard deviation of 2 cm.
- Calculate the standard error of the sample mean.
- Calculate an approximate 99% confidence interval for the mean foot length of men. Write a sentence that interprets this interval.
- For each combination of sample size and sample proportion find the approximate margin of error for the 90% confidence interval:
- Suppose a new cancer treatment is given to a sample of 300 patients. The treatment was successful for 210 of the patients. Assume that these patients are representative of the population of individuals who have this cancer.
- Calculate the sample proportion that was successfully treated.
- Determine a 90% confidence interval for the proportion successful treated. Write a sentence that interprets this interval.
- Suppose a polling organization reports that the margin of error is 3% for a sample survey. Explain what this indicates about the possible difference between a percent determined from the survey data and the population value of the percent.
- A poll conducted in the United States November 8 – 15, 2010 asked “The Secretary of Transportation recently said that he may push Congress for a national ban on using a cell phone while driving. The ban would include hands-free cell phones. Do you think that a national ban on using a cell phone while driving is a good idea or a bad idea?" In the nationwide poll of registered voters 63% said they thought it was a good idea. The margin of error was reported as %. (source: wwwlpollingreport.com).
- Find a 95% confidence interval estimate of the percent of American voters who believe banning cell phones when driving is a good idea at the time of the poll.
- Write a sentence that interprets the interval computed in part (a).
- A Gallup Organization telephone poll of 511 adults, aged 18 and older, living in the continental United States found that 70% of Americans feel confident in the accuracy of their doctor's advice, and don't feel the need to check for a second opinion or do additional research. The margin of error for this survey was given as percentage points.
- Find a 95% confidence interval estimate of the percent of American adults who feel confident in the accuracy of their doctor’s advice and don’t feel the need to check for a second opinion.
- Based on the interval you found, is it reasonable to say that more than 65% of American voting adults have confidence in their doctor’s advice?
- Suppose 100 researchers each plan to independently gather data and construct 95% confidence interval for a population mean. If the number of those intervals that actually cover the population mean, then is a binomial random variable.
- What is a “success” for this random variable?
- What is the numerical value of the probability of success?
- What is the expected number of intervals that will cover their population means?
- In computing the confidence interval for a population mean, , explain whether the interval would be wider, more narrow, or neither as a result of each of the following changes:
- The level of confidence is changed from 85% to 90%.
- The sample size is tripled.
- A new random sample of the same size is taken and is increased by 10.
- Calculate a 98% confidence interval for the proportion successfully treated in problem 13. Is this interval wider or narrower than the interval computed in problem 12?
- In a Gallup Youth Survey done in 2000, 501 randomly selected American teenagers were asked about how well they get along with their parents.
- According to the Gallup Organization, the margin of error for the poll was 5%. Verify that this figure is approximately correct.
- A survey result was that 54% of the sample said they get along “very well” with their parents. Using the reported margin of error, calculate a 90% confidence interval for the population proportion that gets along “very well” with their parents.
- Using the more exact margin of error, calculate a 90% confidence interval. Compare your answer to part (b).
- Determine the value of the multiplier that would be used to compute an 80% confidence interval for a population proportion.
Margin of error