Objective
Here you will learn about confidence intervals, which are ranges within which you expect the population parameter to occur with a particular probability
Concept
Suppose you were at a county fair and saw a large jar full of gumballs, maybe 1000 of them, with a sign that said “Guess the Number, Win a Prize!” If the rules of the game are that you could win a $10 prize by guessing within 200 gumballs either way, or a $50 prize by guessing within five gumballs either way, but you have to specify which prize you are trying for before submitting your guess, which would you choose?
Watch This
http://youtu.be/dNfpsVLaaEE statisticsfun – How to calculate Confidence Intervals and Margin of Error
Guidance
The general concept of confidence intervals is pretty intuitive: It is easier to predict that an unknown value will lie somewhere within a wide range, than to predict it will occur within a narrow range. In other words, if you are making an educated guess about an unknown number, you are more likely to be correct if you predict it will occur within a wider range. This idea is reflected in the concept question above, where the reward is greater if you guess within a smaller range, because the contest creator knows that your chance of guessing correctly is much less if you have to guess within a smaller range.
A confidence interval , centered on the mean of your sample, is the range of values that is expected to capture the population mean with a given level of confidence. A wider confidence interval is a greater range of values, resulting in a greater confidence level that the range will include the population mean. By convention, you will mostly be concerned with identifying the intervals associated with 90%, 95%, and 99% confidence levels.
Calculate the confidence interval by combining the sample mean with the margin of error , found by multiplying the standard error of the mean by the z- score of the percent confidence level:
It is common, but incorrect, to assume that a confidence level indicates the probability that the mean of the population will occur within a given range of the mean of your sample. A 95% confidence interval means that if you took 100 samples, all of the same size, and formed 100 confidence intervals, 95 of these intervals would capture the population mean.
The confidence level indicates the number of times out of 100 that the mean of the population will be within the given interval of the sample mean.
Example A
Suppose you took 100 unbiased random samples of the heights of U.S. women (recall that height is normally distributed), each sample containing 30 women. What can you say about the means of the samples compared to the population mean?
Solution:
Since height is normally distributed, we know that approximately 95% of women will have a height within two standard deviations of the mean (remember the Empirical Rule?). That means that out of 100 samples, we can assume that 95 of them will have a mean within 2 standard deviations of the population mean.
Example B
Suppose the mean of the means of our 100 samples from Example A is 5′5″, in other words, . Within what range of heights can we expect the population mean to be, with 95% confidence? Assume a standard deviation of 1.5″.
Solution:
Remember that since height is normally distributed, 95% of the values lie within 2 standard deviations of the mean, we need to identify that range of values.
- First we need to use to identify the margin of error (since we are looking for a 95% confidence level, this is the range of values within 2 standard deviations of the sample mean). Since , in this case we get above and below .
- The interval then is 5′2″ to 5′8″, or three inches above and below the mean of 5′5″.
We can say that there is a 95% probability that the mean of our 100 samples would be within 0.3 inches either way of the population mean. Since the mean of our sample is 5′5″, we can say that the population mean is between 5′4.7″ and 5′5.3″ with 95% confidence.
Mathematically:
Example C
Suppose you plot the mean of each of your height samples on a graph, and drawing a line each way of the mean of each sample to represent 2 standard deviations. If you were to do this for 50 of the samples, you might end up with an image like the one below.
(The image is a screen capture from the interactive applet at: http://bcs.whfreeman.com/ips5e/content/cat_010/applets/confidenceinterval.html .)
At the top of the image is a normal curve. Each of the lines below the curve has a length that represents a 95% confidence interval, centered on the mean (in red) of the sample.
- What is indicated by the lines that are all red in color?
- What value is indicated by the vertical red center line on each interval?
- What does the “percent hit” number mean? How would it change if you were to continue taking more and more samples of 60 each?
Solution:
- The lines that are colored entirely red have a mean that is greater than 2 standard deviations away from the population mean. In other words, the mean of those two samples was not within the stated confidence interval (95%).
- The vertical red center line represents the mean of each sample.
- The “percent hit” number indicates the percentage of times that the population mean was included in the confidence interval of sample means. If you were to continue plotting sample means and confidence intervals, the percent hit would approach 95%. In fact, here is the same graph after 1000 sample runs:
Concept Problem Revisited
Suppose you were at a county fair and saw a large jar full of gumballs, maybe 1000 of them, with a sign that said “Guess the Number, Win a Prize!” If the rules of the game are that you could win a $10 prize by guessing within 200 gumballs either way, or a $50 prize by guessing within five gumballs either way, but you have to specify which prize you are trying for before submitting your guess, which would you choose?
This problem/question is meant to give you an intuitive feeling for the concept of a confidence interval or confidence level. It should be clear that you would have a greater level of confidence in trying for a $10 prize that you would win simply by guessing within +/- 20% of the number, than in trying for $50 by guessing within +/- 0.5% of the number!
Vocabulary
A confidence interval is the interval within which you expect to capture a specific value. The confidence interval width is dependent on the confidence level.
A confidence level is the probability value associated with a confidence interval.
Guided Practice
- Suppose you took 40 unbiased random samples of the number of candies in a $0.75 bag of candy from a particular factory. The factory states that the number of candies per bag is normally distributed. What can you say about the mean number of candies in your sample?
- Suppose the factory states that the number of candies per bag has . If each sample includes data from 40 bags of candies , what is the standard error of the mean ?
- If the sample mean is 38 candies, within what interval could we expect 99 out of each 100 samples to contain the population mean? What is that interval known as?
- What is the more common way to describe the fact that “expect 99 out of each 100 samples contain the population mean”?
Solutions :
1. Since the population is normally distributed, we can state that the mean of the sample follows the Empirical Rule.
2. The standard error of the mean is calculated as , so
3. The interval is called the confidence interval , and it is calculated as :
Therefore, the confidence interval is approximately 37.18 to 38.82.
4. Saying that you “expect 99 out of each 100 samples contain the population mean”, is the same as saying that the interval has a 99% confidence level.
Practice
1. What is a confidence interval?
2. What is the formula for calculating the confidence interval?
3. What is the difference between a confidence interval and a confidence level?
4. What is a margin of error?
5. How is the margin of error calculated?
6. What common misconception about confidence level is corrected by stating that a 99% confidence level means that 99 out of 100 samples are expected to contain the population mean?
7. If a population is known to have an approximately normal distribution, but the standard deviation is unknown, how can the population standard deviation be approximated?
8. If the sample mean is unknown, is it safe to use the population mean as the sample mean?
9. What Z -score corresponds to a 98% confidence interval?
10. What confidence interval is associated with a Z -score of 2.576, assuming a two-tailed test?
11. Which confidence level would describe a wider confidence interval, 80% or 85%?
12. A factory produces bags of marbles for a toy store. The factory has previously calculated that the . If you were to sample 35 bags and calculate , within what range could you predict , with 98% confidence?
13. Interpret your results from question 12, in context.
14. The manager of a clothing store is attempting to estimate the mean number of customers that pass through her store each day. If the data from past estimates and other franchises suggests that , and the manager has collected the customer counts in the table below from a SRS (Simple Random Sample), what can the manager predict the range of customers to be, with 50% confidence?
148 |
298 |
210 |
213 |
315 |
129 |
145 |
148 |
131 |
281 |
317 |
15. Interpret your answer from problem 14, in context.