<meta http-equiv="refresh" content="1; url=/nojavascript/"> The z-Score and the Central Limit Theorem | CK-12 Foundation
Dismiss
Skip Navigation
You are reading an older version of this FlexBook® textbook: CK-12 Probability and Statistics - Advanced Go to the latest version.

7.2: The z-Score and the Central Limit Theorem

Difficulty Level: At Grade Created by: CK-12

Learning Objectives

  • Calculate the z-score of a mean distribution of a random variable in problem situations.
  • Understand the Central Limit Theorem and calculate a sampling distribution using the mean and standard deviation of a normally distributed random variable.
  • Understand the relationship between the Central Limit Theorem and normal approximation of the binomial distribution.

Introduction

In the previous lesson you learned that sampling is an important tool for determining the characteristics of a population. Although the parameters of the population (mean, standard deviation, etc.) were unknown, random sampling was used to yield reliable estimates of these values. The estimates were plotted on graphs to provide a visual representation of the distribution of the sample mean for various sample sizes. It is now time to define some properties of the sampling distribution of the sample mean and to examine what we can conclude about the entire population based on it.

All normal distributions have the same basic shape and therefore rescaling and recentering can be implemented to change any normal distributions to one with a mean of zero and a standard deviation of one. This configuration is referred to as standard normal distribution. In this distribution, the variable along the horizontal axis is called the z-score. This score is another measure of the performance of an individual score in a population. The z-score measures how many standard deviations a score is away from the mean. The z-score of a term x in a population distribution whose mean is \mu and whose standard deviation \sigma is given by:

 z = \frac {x - \mu}{\sigma}

Since \sigma is always positive, z will be positive when X is greater than \mu and negative when X is less than \mu. A z-score of zero means that the term has the same value as the mean. For the normal standard distribution, where \mu = 0, if we let x = \sigma, then z = 1. If we let x = 2 \sigma , z = 2. Thus, a value of z tells the number of standard deviations the given value of x is above or below the mean.

Example: On a nationwide math test the mean was 65 and the standard deviation was 10. If Robert scored 81, what was his z-score?

Solution:

 z & = \frac {x - \mu}{\sigma}\\z & = \frac{81 - 65}{10}\\z & = \frac{16}{10}\\z & = 1.6

Example: On a college entrance exam, the mean was 70 and the standard deviation was 8. If Helen’s z-score was -1.5, what was her exam mark?

Solution:

z &  = \frac{x - \mu} {\sigma}\\\therefore z \cdot \sigma  & = x - \mu\\X & = \mu + z \cdot \sigma\\X & = (70) + (-1.5)(8)\\X & = 58

Now you will see how z-scores are used to determine the probability of an event.

Suppose you were to toss 8 \;\mathrm{coins}\ 2560 \;\mathrm{times}. The following figure shows the histogram and the approximating normal curve for the experiment. The random variable represents the number of tails obtained.

The blue section of the graph represents the probability that exactly 3 of the coins turned up tails. One way to determine this is by the following

P (3 \;\mathrm{tails})  & = \frac{_8C_3} {2^8}\\P (3 \;\mathrm{tails})  & = \frac{56} {256}\\P (3 \;\mathrm{tails}) & \cong 0.2186

Geometrically this probability represents the area of the blue shaded bar divided by the total area of the bars. The area of the shaded bar is approximately equal to the area under the normal curve from 2.5 to 3.5.

Since areas under normal curves correspond to the probability of an event occurring, a special normal distribution table is used to calculate the probabilities. This table can be found in any statistics book, but is seldom used today. Below is an example of a table of z-scores and a brief explanation of how it works.

As shown in the illustration below, the values inside the given table represent the areas under the standard normal curve for values between 0 and the relative z-score. For example, to determine the area under the curve between 0 and 2.36, look in the intersecting cell for the row labeled 2.30 and the column labeled 0.06. The area under the curve is 0.4909. To determine the area between 0 and a negative value, look in the intersecting cell of the row and column which sums to the absolute value of the number in question. For example, the area under the curve between -1.3 and 0 is equal to the area under the curve between 1.3 and 0, so look at the cell on the 1.3 row and the 0.00 column (the area is 0.4032).

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

The graphing calculator will give greater accuracy in finding the proportion of values that lie between two specified values in a standard normal distribution.

To use the TI-83 calculator for this operation is quite simple. Follow these steps.

2^{nd} Vars – This will access the distribution function

Scroll down to 2: normalcdf( enter \longrightarrow

This screen appears \longrightarrow

Type in the numbers (0,2.36 enter \longrightarrow

The calculator has given an answer that is more accurate than that given in the chart. However, if the answer is rounded to the nearest ten-thousandth, then both answers would be the same. Using the calculator is a more efficient method of obtaining the z-score since you all have them on hand.

Example: For a normal distribution curve based on values of \sigma = 5 and \mu = 20, find the area between x = 24 and x = 32.

Solution:

& z = \frac{x - \mu} {\sigma} & & \text{and} & & z = \frac{x - \mu} {\sigma}\\& z = \frac{24 - 20} {5} & & \text{and} & & z = \frac{32 - 20} {5}\\& z = 0.8 & & \text{and} & & z = 2.4

Using the TI-83

The area for z = 0.8 is 0.2881 and for z = 2.4 is 0.4918. Therefore the area between x = 24 and x = 32 is:

0.4918 - 0.2881 = 0.2037

This means that the relative frequency of the values between x = 24 and x = 32 is 20.37\%.

Central Limit Theorem

The Central Limit Theorem is a very important theorem in statistics. It basically confirms what might be an intuitive truth to you: that as you increase the number of trials of a random variable, the distribution of the sample trials better approximates a normal distribution.

Before going any further, you should become familiar with (or reacquaint yourself with) the symbols that are commonly used when dealing with properties of the sampling distribution of the sample mean. These symbols are shown in the table below:

Population Parameter Sample Statistic Sampling Distribution
Mean \mu \bar {x}

\mu_{\bar {x}}

Standard Deviation \sigma s

S_{\bar {x}} or \sigma_{\bar{x}}

Size N n

In the previous lesson, you discovered that the standard error is the standard deviation of the sampling distribution and this value was calculated by using the formula s = \sqrt{\frac{P \cdot Q} {n}}. By making a few substitutions, this formula can be rewritten using the symbols from the chart above. The formula s = \sqrt{\frac{P \cdot Q} {n}} can be expressed as the quotient of two radical expressions s = \frac{\sqrt{P \cdot Q}} {\sqrt{n}}. The square root of the product of the parameters P and Q is actually the standard deviation of the population (\sigma). When this value is divided by square root of the sample size, the result is the standard error (s), also known as the standard deviation of the sampling distribution (S_{\bar {x}}). Therefore s = \sqrt{\frac{P \cdot Q} {n}} can be written as S_{\bar {x}} = \frac{\sigma} {\sqrt{n}} This frequency distribution only approximates the true sampling distribution of the sample mean because a finite number of sample means were used. If, hypothetically, an infinite number of sample means were used, the resulting distribution would be the desired sampling distribution and the following would be true:

\sigma_{\bar {x}} = \frac{\sigma} {\sqrt{n}}

The notation \sigma_{\bar {x}} reminds you that this is the standard deviation of the sample mean (\bar {x}) and not the standard deviation (\sigma) of a single observation.

The Central Limit Theorem states the following:

  • If samples of size n are drawn at random from any population with a finite mean and standard deviation, then the sampling distribution of the sample mean (\bar {x}) approximates a normal distribution as n increases.
  • The mean of this sampling distribution approximates the population mean as n becomes large:

\mu \approx \mu_{\bar {x}}

  • The standard deviation of the sample mean is approximately equivalent to the following

\sigma_{\bar {x}} = \frac{\sigma} {\sqrt{n}}

These properties of the sampling distribution of the mean can be applied to determining probabilities. The sampling distribution of the sample mean can be assumed to be approximately normal, even if the population is not normally distributed. Now that it has been clarified that the sampling distribution of the mean is approximately normal, let’s see how these properties work. Suppose you wanted to answer the question, “What is the probability that a random sample of 20 families in Canada will have an average of 1.5 pets or fewer?” where the mean of the population is 0.8 and the standard deviation of the population is 1.2.

For the sampling distribution \mu_{\bar {x}} = \mu = 0.8 and \sigma_{\bar {x}} = \frac{\sigma} {\sqrt{n}} = \frac{1.2} {\sqrt{20}} \approx 0.27

Using technology, a sketch of this problem is

The shaded area shows the probability that the sample mean is less than 1.5.

The z - score for the value 1.5 is z = \frac{\bar {x} - \mu_{\bar {x}}} {\sigma_{\bar{x}}} \approx \frac{1.5 - 0.8} {0.27} \approx 2.6

As shown above, the area under the standard normal curve to the left of 1.5 (a z-score of 2.6) is approximately 0.9937. This value can also be determined by using the graphing calculator

The probability that the sample mean will be below 1.5 is 0.9937. In a random sample of 20 families, it is almost definite that the average number of pets per family will be less than 1.5.

These three properties associated with the Central Limit Theorem are displayed in the diagram below:

The vertical axis now reads probability density rather than frequency. Frequency can only be used when you are dealing with a finite number of sample means, as it is the number of selections divided by the total number of sample means. Sampling distributions, on the other hand, are theoretical depictions of an infinite number of sample means, and probability density is the relative density of the selections from within this set.

Example:

A random sample of size 40 is selected from a known population with a mean of 23.5 and a standard deviation of 4.3. Samples of the same size are repeatedly collected allowing a sampling distribution of the sample mean to be drawn.

a) What is the expected shape of the resulting distribution?

b) Where is the sampling distribution of the sample mean centered?

c) What is the standard deviation of the sample mean?

Solution:

The question indicates that an infinite number of samples of size 40 are being collected from a known population, an infinite number of sample means are being calculated and then the sampling distribution of the sample mean is being studied. Therefore, an understanding of the Central Limit Theorem is necessary to answer the question.

a) The sampling distribution of the sample mean will be bell-shaped.

b) The sampling distribution of the sample mean will be centered about the population mean of 23.5

c) \sigma_{\bar{x}} & = \frac{\sigma} {\sqrt{n}}\\\sigma_{\bar{x}} & = \frac{4.3} {\sqrt{40}}\\\sigma_{\bar{x}} &  = 0.68

A sample with a sample size of 40 is taken from a known population where \mu = 25 and \sigma = 4. The following chart displays the collected data:

& 24 && 23 && 30 && 17 && 24 && 22 && 23 && 21 && 29 && 25\\& 26 && 25 && 29 && 28 && 29 && 29 && 32 && 22 && 27 && 28 && \\& 24 && 32 && 21 && 29 && 30 && 18 && 21 && 24 && 30 && 24\\& 25 && 26 && 25 && 27 && 26 && 25 && 27 && 24 && 24 && 25

a) What is the population mean?

b) Determine the sample mean using technology.

c) What is the population standard deviation?

d) Using technology, determine the sample standard deviation.

e) If an infinite number of samples of size 40 were collected from this population, what would be the value of the sample means?

f) If an infinite number of samples of size 40 were collected from this population, what would be the value of the standard deviation of the sample means?

Solution:

a) \mu = 25 The population mean of 25 was given in the question.

b) \bar {x} = 25.5 The sample mean is 25.5 and is determined by using 1 - Vars Stat on the TI-83.

c) \sigma = 4 The population standard deviation of 4 was given in the question.

d) S_x= 3.47 The sample standard deviation is 3.47 and is determined by using 1 - Vars Stat on the TI-83.

e) \mu_{\bar {x}} = 25 A property of the Central Limit Theorem.

f) \sigma_{\bar {x}} & = \frac{\sigma} {\sqrt{n}}\\\sigma_{\bar {x}} &  = \frac{4} {\sqrt{40}}\\\sigma_{\bar {x}} &  = 0.63 \ \ \ \text{Central Limit Theorem}

Lesson Summary

For approximately normal distributions, the mean and the standard deviation are used as the measure of center and spread. If these two values are known, the z-scores and the calculator can be used to find the percentage of values in any interval.

The Central Limit Theorem confirms the intuitive notion that with a large enough number of trials that are performed on a random variable, the sample means will begin to approximate a normal distribution.

Points to Consider

  • What is a binomial experiment?
  • What is the difference between a population proportion and a sample proportion?
  • How does sample size affect the variation in sample results?

Review Questions

  1. The lifetimes of a certain type of calculator battery are normally distributed. The mean lifetime is 400 days with a standard deviation of 50 days. For a sample of 6000 new batteries, determine how many batteries will last
    1. between 360 and 460 days
    2. more than 320 days
    3. less than 280 days.
  2. A sample with a sample size of 40 is taken from a known population where \mu = 25 and \sigma = 4 The following chart displays the collected data: & 24 && 23 && 30 && 17 && 24 && 22 && 23 && 21 && 29 && 25 \\& 26 && 25 && 29 && 28 && 29 && 29 && 32 && 22 && 27 && 28 \\& 24 && 32 && 21 && 29 && 30 && 18 && 21 && 24 && 30 && 24 \\& 25 && 26 && 25 && 27 && 26 && 25 && 27 && 24 && 24 && 25
    1. What is the population mean?
    2. Determine the sample mean using technology.
    3. What is the population standard deviation?
    4. Using technology, determine the sample standard deviation.
    5. If an infinite number of samples of size 40 were collected from this population, what would be the value the mean of the sample means?
    6. If an infinite number of samples of size 40 were collected from this population, what would be the value of the standard deviation of the sample means?

Review Answers

  1. (a) z & = \frac{x - \mu} {\sigma} && \text{and} && z = \frac{x - \mu} {\sigma} \\z & = \frac{360 - 400} {50} && \text{and} &&  z = \frac{460 - 400} {50} \\z & = -0.8 &&  \text{and} && z= 1.2 Using the graphing calculator the area for z = -0.8 is 0.2881 and for z = 1.2 is 0.3849 \text{Area is:} \ 0.2881 + 0.3849 & = 0.6730 \\(.6730)(6000) & = 4038 This means that 67.3\% of the 6000 batteries lasted between 360 and 460 days. Note: For z = -0.8, the area is to the left of the mean. However, the curve is symmetrical about the mean and the value of the area for z = 0.8 is used and added to the area of z = 1.2. 4038 batteries will last between 360 and 460 days. (b) z & = \frac{x - \mu} {\sigma} \\z & = \frac{320 - 400} {50} \\z & = -1.6 The area for z = 1.6 is 0.4452. 0.4452 + 0.5000 & = 0.9452 \\(.9452)(6000) & = 5671 This means that 94.52\% of the 6000 batteries lasted more than 320 days. Note: For z = -1.6, the total area to the right of the mean is needed. Since the total area under the curve is one, the total area on either side of the mean is 0.5000. This area must be added to the area 0.4452 5671 batteries will last more than 320 days (c) z & = \frac{x - \mu} {\sigma} \\z & = \frac{280 - 400} {50} \\z & = -2.4 The area for z = 2.4 is 0.4918. 0.5000 - 0.4918 & = 0.0082 \\(.0082)(6000) & = 49 This means that 0.82\% of the 6000 batteries lasted less than 280 days. Note: Since the total area to the left of z = -2.4 is required, the area for z = 2.4 is subtracted from 0.5000 49 batteries will last less than 280 days
    1. \mu = 25 The population mean of 25 was given in the question.
    2. \bar {x} = 25.5 The sample mean is 25.5 and is determined by using 1 - Vars Stat on the TI-83.
    3. \sigma = 4 The population standard deviation of 4 was given in the question.
    4. S_x= 3.47 The sample standard deviation is 3.47 and is determined by using 1 - Vars Stat on the TI-83.
    5. \mu_{\bar {x}} = 25 A property of the Central Limit Theorem.
    6. \sigma_{\bar {x}} & = \frac{\sigma} {\sqrt{n}} \\\sigma_{\bar {x}} & = \frac{4} {\sqrt{40}} \\\sigma_{\bar {x}} & = 0.63 \ \ \ \text{Central Limit Theorem}

Vocabulary

Central Limit Theorem
An important result in statistics, stating that the shape of the sampling distribution of the sample mean becomes more normal as n increases.
Standard Normal Distribution
A normal distribution with a mean of zero and a standard deviation of one.
Z- score
The variable along the horizontal axis of a normal distribution.

Image Attributions

Files can only be attached to the latest version of None

Reviews

Please wait...
Please wait...
Image Detail
Sizes: Medium | Original
 
CK.MAT.ENG.SE.1.Prob-&-Stats-Adv.7.2

Original text