- Calculate the score of a mean distribution of a random variable in problem situations.
- Understand the Central Limit Theorem and calculate a sampling distribution using the mean and standard deviation of a normally distributed random variable.
- Understand the relationship between the Central Limit Theorem and normal approximation of the binomial distribution.
In the previous lesson you learned that sampling is an important tool for determining the characteristics of a population. Although the parameters of the population (mean, standard deviation, etc.) were unknown, random sampling was used to yield reliable estimates of these values. The estimates were plotted on graphs to provide a visual representation of the distribution of the sample mean for various sample sizes. It is now time to define some properties of the sampling distribution of the sample mean and to examine what we can conclude about the entire population based on it.
All normal distributions have the same basic shape and therefore rescaling and recentering can be implemented to change any normal distributions to one with a mean of zero and a standard deviation of one. This configuration is referred to as standard normal distribution. In this distribution, the variable along the horizontal axis is called the score. This score is another measure of the performance of an individual score in a population. The score measures how many standard deviations a score is away from the mean. The score of a term in a population distribution whose mean is and whose standard deviation is given by:
Since is always positive, will be positive when is greater than and negative when is less than . A score of zero means that the term has the same value as the mean. For the normal standard distribution, where , if we let , then . If we let , . Thus, a value of tells the number of standard deviations the given value of is above or below the mean.
Example: On a nationwide math test the mean was and the standard deviation was . If Robert scored , what was his score?
Example: On a college entrance exam, the mean was and the standard deviation was . If Helen’s score was , what was her exam mark?
Now you will see how scores are used to determine the probability of an event.
Suppose you were to toss . The following figure shows the histogram and the approximating normal curve for the experiment. The random variable represents the number of tails obtained.
The blue section of the graph represents the probability that exactly of the coins turned up tails. One way to determine this is by the following
Geometrically this probability represents the area of the blue shaded bar divided by the total area of the bars. The area of the shaded bar is approximately equal to the area under the normal curve from to .
Since areas under normal curves correspond to the probability of an event occurring, a special normal distribution table is used to calculate the probabilities. This table can be found in any statistics book, but is seldom used today. Below is an example of a table of scores and a brief explanation of how it works.
As shown in the illustration below, the values inside the given table represent the areas under the standard normal curve for values between and the relative score. For example, to determine the area under the curve between and , look in the intersecting cell for the row labeled and the column labeled . The area under the curve is . To determine the area between and a negative value, look in the intersecting cell of the row and column which sums to the absolute value of the number in question. For example, the area under the curve between and is equal to the area under the curve between and , so look at the cell on the row and the column (the area is ).
The graphing calculator will give greater accuracy in finding the proportion of values that lie between two specified values in a standard normal distribution.
To use the TI-83 calculator for this operation is quite simple. Follow these steps.
Vars – This will access the distribution function
Scroll down to : normalcdf( enter
This screen appears
Type in the numbers ( enter
The calculator has given an answer that is more accurate than that given in the chart. However, if the answer is rounded to the nearest ten-thousandth, then both answers would be the same. Using the calculator is a more efficient method of obtaining the score since you all have them on hand.
Example: For a normal distribution curve based on values of and , find the area between and .
Using the TI-83
The area for is and for is . Therefore the area between and is:
This means that the relative frequency of the values between and is .
Central Limit Theorem
The Central Limit Theorem is a very important theorem in statistics. It basically confirms what might be an intuitive truth to you: that as you increase the number of trials of a random variable, the distribution of the sample trials better approximates a normal distribution.
Before going any further, you should become familiar with (or reacquaint yourself with) the symbols that are commonly used when dealing with properties of the sampling distribution of the sample mean. These symbols are shown in the table below:
In the previous lesson, you discovered that the standard error is the standard deviation of the sampling distribution and this value was calculated by using the formula . By making a few substitutions, this formula can be rewritten using the symbols from the chart above. The formula can be expressed as the quotient of two radical expressions . The square root of the product of the parameters and is actually the standard deviation of the population . When this value is divided by square root of the sample size, the result is the standard error , also known as the standard deviation of the sampling distribution . Therefore can be written as This frequency distribution only approximates the true sampling distribution of the sample mean because a finite number of sample means were used. If, hypothetically, an infinite number of sample means were used, the resulting distribution would be the desired sampling distribution and the following would be true:
The notation reminds you that this is the standard deviation of the sample mean and not the standard deviation of a single observation.
The Central Limit Theorem states the following:
- If samples of size are drawn at random from any population with a finite mean and standard deviation, then the sampling distribution of the sample mean approximates a normal distribution as increases.
- The mean of this sampling distribution approximates the population mean as becomes large:
- The standard deviation of the sample mean is approximately equivalent to the following
These properties of the sampling distribution of the mean can be applied to determining probabilities. The sampling distribution of the sample mean can be assumed to be approximately normal, even if the population is not normally distributed. Now that it has been clarified that the sampling distribution of the mean is approximately normal, let’s see how these properties work. Suppose you wanted to answer the question, “What is the probability that a random sample of families in Canada will have an average of pets or fewer?” where the mean of the population is and the standard deviation of the population is .
For the sampling distribution and
Using technology, a sketch of this problem is
The shaded area shows the probability that the sample mean is less than .
The score for the value is
As shown above, the area under the standard normal curve to the left of (a score of ) is approximately . This value can also be determined by using the graphing calculator
The probability that the sample mean will be below is