- Calculate the score of a mean distribution of a random variable in problem situations.
- Understand the Central Limit Theorem and calculate a sampling distribution using the mean and standard deviation of a normally distributed random variable.
- Understand the relationship between the Central Limit Theorem and normal approximation of the binomial distribution.
In the previous lesson you learned that sampling is an important tool for determining the characteristics of a population. Although the parameters of the population (mean, standard deviation, etc.) were unknown, random sampling was used to yield reliable estimates of these values. The estimates were plotted on graphs to provide a visual representation of the distribution of the sample mean for various sample sizes. It is now time to define some properties of the sampling distribution of the sample mean and to examine what we can conclude about the entire population based on it.
All normal distributions have the same basic shape and therefore rescaling and recentering can be implemented to change any normal distributions to one with a mean of zero and a standard deviation of one. This configuration is referred to as standard normal distribution. In this distribution, the variable along the horizontal axis is called the score. This score is another measure of the performance of an individual score in a population. The score measures how many standard deviations a score is away from the mean. The score of a term in a population distribution whose mean is and whose standard deviation is given by:
Since is always positive, will be positive when is greater than and negative when is less than . A score of zero means that the term has the same value as the mean. For the normal standard distribution, where , if we let , then . If we let , . Thus, a value of tells the number of standard deviations the given value of is above or below the mean.
Example: On a nationwide math test the mean was and the standard deviation was . If Robert scored , what was his score?
Example: On a college entrance exam, the mean was and the standard deviation was . If Helen’s score was , what was her exam mark?
Now you will see how scores are used to determine the probability of an event.
Suppose you were to toss . The following figure shows the histogram and the approximating normal curve for the experiment. The random variable represents the number of tails obtained.
The blue section of the graph represents the probability that exactly of the coins turned up tails. One way to determine this is by the following
Geometrically this probability represents the area of the blue shaded bar divided by the total area of the bars. The area of the shaded bar is approximately equal to the area under the normal curve from to .
Since areas under normal curves correspond to the probability of an event occurring, a special normal distribution table is used to calculate the probabilities. This table can be found in any statistics book, but is seldom used today. Below is an example of a table of scores and a brief explanation of how it works.
As shown in the illustration below, the values inside the given table represent the areas under the standard normal curve for values between and the relative score. For example, to determine the area under the curve between and , look in the intersecting cell for the row labeled and the column labeled . The area under the curve is . To determine the area between and a negative value, look in the intersecting cell of the row and column which sums to the absolute value of the number in question. For example, the area under the curve between and is equal to the area under the curve between and , so look at the cell on the row and the column (the area is ).
The graphing calculator will give greater accuracy in finding the proportion of values that lie between two specified values in a standard normal distribution.
To use the TI-83 calculator for this operation is quite simple. Follow these steps.
Vars – This will access the distribution function
Scroll down to : normalcdf( enter
This screen appears
Type in the numbers ( enter
The calculator has given an answer that is more accurate than that given in the chart. However, if the answer is rounded to the nearest ten-thousandth, then both answers would be the same. Using the calculator is a more efficient method of obtaining the score since you all have them on hand.
Example: For a normal distribution curve based on values of and , find the area between and .
Using the TI-83
The area for is and for is . Therefore the area between and is:
This means that the relative frequency of the values between and is .
Central Limit Theorem
The Central Limit Theorem is a very important theorem in statistics. It basically confirms what might be an intuitive truth to you: that as you increase the number of trials of a random variable, the distribution of the sample trials better approximates a normal distribution.
Before going any further, you should become familiar with (or reacquaint yourself with) the symbols that are commonly used when dealing with properties of the sampling distribution of the sample mean. These symbols are shown in the table below:
In the previous lesson, you discovered that the standard error is the standard deviation of the sampling distribution and this value was calculated by using the formula . By making a few substitutions, this formula can be rewritten using the symbols from the chart above. The formula can be expressed as the quotient of two radical expressions . The square root of the product of the parameters and is actually the standard deviation of the population . When this value is divided by square root of the sample size, the result is the standard error , also known as the standard deviation of the sampling distribution . Therefore can be written as This frequency distribution only approximates the true sampling distribution of the sample mean because a finite number of sample means were used. If, hypothetically, an infinite number of sample means were used, the resulting distribution would be the desired sampling distribution and the following would be true:
The notation reminds you that this is the standard deviation of the sample mean and not the standard deviation of a single observation.
The Central Limit Theorem states the following:
- If samples of size are drawn at random from any population with a finite mean and standard deviation, then the sampling distribution of the sample mean approximates a normal distribution as increases.
- The mean of this sampling distribution approximates the population mean as becomes large:
- The standard deviation of the sample mean is approximately equivalent to the following
These properties of the sampling distribution of the mean can be applied to determining probabilities. The sampling distribution of the sample mean can be assumed to be approximately normal, even if the population is not normally distributed. Now that it has been clarified that the sampling distribution of the mean is approximately normal, let’s see how these properties work. Suppose you wanted to answer the question, “What is the probability that a random sample of families in Canada will have an average of pets or fewer?” where the mean of the population is and the standard deviation of the population is .
For the sampling distribution and
Using technology, a sketch of this problem is
The shaded area shows the probability that the sample mean is less than .
The score for the value is
As shown above, the area under the standard normal curve to the left of (a score of ) is approximately . This value can also be determined by using the graphing calculator
The probability that the sample mean will be below is . In a random sample of families, it is almost definite that the average number of pets per family will be less than .
These three properties associated with the Central Limit Theorem are displayed in the diagram below:
The vertical axis now reads probability density rather than frequency. Frequency can only be used when you are dealing with a finite number of sample means, as it is the number of selections divided by the total number of sample means. Sampling distributions, on the other hand, are theoretical depictions of an infinite number of sample means, and probability density is the relative density of the selections from within this set.
A random sample of size is selected from a known population with a mean of and a standard deviation of . Samples of the same size are repeatedly collected allowing a sampling distribution of the sample mean to be drawn.
a) What is the expected shape of the resulting distribution?
b) Where is the sampling distribution of the sample mean centered?
c) What is the standard deviation of the sample mean?
The question indicates that an infinite number of samples of size are being collected from a known population, an infinite number of sample means are being calculated and then the sampling distribution of the sample mean is being studied. Therefore, an understanding of the Central Limit Theorem is necessary to answer the question.
a) The sampling distribution of the sample mean will be bell-shaped.
b) The sampling distribution of the sample mean will be centered about the population mean of
A sample with a sample size of is taken from a known population where and . The following chart displays the collected data:
a) What is the population mean?
b) Determine the sample mean using technology.
c) What is the population standard deviation?
d) Using technology, determine the sample standard deviation.
e) If an infinite number of samples of size were collected from this population, what would be the value of the sample means?
f) If an infinite number of samples of size were collected from this population, what would be the value of the standard deviation of the sample means?
a) The population mean of was given in the question.
b) The sample mean is and is determined by using Vars Stat on the TI-83.
c) The population standard deviation of was given in the question.
d) The sample standard deviation is and is determined by using Vars Stat on the TI-83.
e) A property of the Central Limit Theorem.
For approximately normal distributions, the mean and the standard deviation are used as the measure of center and spread. If these two values are known, the z-scores and the calculator can be used to find the percentage of values in any interval.
The Central Limit Theorem confirms the intuitive notion that with a large enough number of trials that are performed on a random variable, the sample means will begin to approximate a normal distribution.
Points to Consider
- What is a binomial experiment?
- What is the difference between a population proportion and a sample proportion?
- How does sample size affect the variation in sample results?
- The lifetimes of a certain type of calculator battery are normally distributed. The mean lifetime is days with a standard deviation of days. For a sample of new batteries, determine how many batteries will last
- between and days
- more than days
- less than days.
- A sample with a sample size of is taken from a known population where and The following chart displays the collected data:
- What is the population mean?
- Determine the sample mean using technology.
- What is the population standard deviation?
- Using technology, determine the sample standard deviation.
- If an infinite number of samples of size were collected from this population, what would be the value the mean of the sample means?
- If an infinite number of samples of size were collected from this population, what would be the value of the standard deviation of the sample means?
- (a) Using the graphing calculator the area for is and for is This means that of the batteries lasted between and days. Note: For , the area is to the left of the mean. However, the curve is symmetrical about the mean and the value of the area for is used and added to the area of . batteries will last between and days. (b) The area for is . This means that of the batteries lasted more than days. Note: For , the total area to the right of the mean is needed. Since the total area under the curve is one, the total area on either side of the mean is . This area must be added to the area batteries will last more than days (c) The area for is . This means that of the batteries lasted less than days. Note: Since the total area to the left of is required, the area for is subtracted from batteries will last less than days
The population mean of was given in the question.
The sample mean is and is determined by using Vars Stat on the TI-83.
The population standard deviation of was given in the question.
The sample standard deviation is and is determined by using Vars Stat on the TI-83.
A property of the Central Limit Theorem.
Central Limit Theorem
An important result in statistics, stating that the shape of the sampling distribution of the sample mean becomes more normal as increases.
Standard Normal Distribution
A normal distribution with a mean of zero and a standard deviation of one.
The variable along the horizontal axis of a normal distribution.