Density Curve of the Normal Distribution
In this section, we will continue our investigation of normal distributions to include density curves and learn various methods for calculating probabilities from the normal density curve.
A density curve is an idealized representation of a distribution in which the area under the curve is defined to be 1. Density curves need not be normal, but the normal density curve will be the most useful to us.
Inflection Points on a Normal Density Curve
We already know from the Empirical Rule that approximately of the data in a normal distribution lies within 1 standard deviation of the mean. With a normal density curve, this means that about 68% of the total area under the curve is within -scores of . Look at the following three density curves:
Notice that the curves are spread increasingly wider. Lines have been drawn to show the points that are one standard deviation on either side of the mean. Look at where this happens on each density curve. Here is a normal distribution with an even larger standard deviation.
Is it possible to predict the standard deviation of this distribution by estimating the -coordinate of a point on the density curve? Read on to find out!
You may have noticed that the density curve changes shape at two points in each of our examples. These are the points where the curve changes concavity. Starting from the mean and heading outward to the left and right, the curve is concave down. (It looks like a mountain, or '' shape.) After passing these points, the curve is concave up. (It looks like a valley, or '' shape.) The points at which the curve changes from being concave up to being concave down are called the inflection points. On a normal density curve, these inflection points are always exactly one standard deviation away from the mean.
In this example, the standard deviation is 3 units. We can use this concept to estimate the standard deviation of a normally distributed data set.
Drawing Density Curves
Estimate the standard deviation of the distribution represented by the following histogram.
This distribution is fairly normal, so we could draw a density curve to approximate it as follows:
Now estimate the inflection points as shown below:
It appears that the mean is about 0.5 and that the -coordinates of the inflection points are about 0.45 and 0.55, respectively. This would lead to an estimate of about 0.05 for the standard deviation.
The actual statistics for this distribution are as follows:
We can verify these figures by using the expectations from the Empirical Rule. In the following graph, we have highlighted the bins that are contained within one standard deviation of the mean.
If you estimate the relative frequencies from each bin, their total is remarkably close to 68%. Make sure to divide the relative frequencies from the bins on the ends by 2 when performing your calculation.
Calculating Density Curve Areas
While it is convenient to estimate areas under a normal curve using the Empirical Rule, we often need more precise methods to calculate these areas. Luckily, we can use formulas or technology to help us with the calculations.
All normal distributions have the same basic shape, and therefore, rescaling and re-centering can be implemented to change any normal distributions to one with a mean of 0 and a standard deviation of 1. This configuration is referred to as a standard normal distribution. In a standard normal distribution, the variable along the horizontal axis is the -score. This score is another measure of the performance of an individual score in a population. To review, the -score measures how many standard deviations a score is away from the mean. The -score of the term in a population distribution whose mean is and whose standard deviation is is given by: . Since is always positive, will be positive when is greater than and negative when is less than . A -score of 0 means that the term has the same value as the mean. The value of is the number of standard deviations the given value of is above or below the mean.
Using -Scores to Calculate the Probability of an Event
On a college entrance exam, the mean was 70 and the standard deviation was 8. If Helen’s -score was , what was her exam score?
Now you will see how -scores are used to determine the probability of an event.
Suppose you were to toss 8 coins 256 times. The following figure shows the histogram and the approximating normal curve for the experiment. The random variable represents the number of tails obtained.
The blue section of the graph represents the probability that exactly 3 of the coins turned up tails. One way to determine this is by the following:
Geometrically, this probability represents the area of the blue shaded bar divided by the total area of the bars. The area of the blue shaded bar is approximately equal to the area under the normal curve from 2.5 to 3.5.
Since areas under normal curves correspond to the probability of an event occurring, a special normal distribution table is used to calculate the probabilities. This table can be found in any statistics book, but it is seldom used today.
The values inside the given table represent the areas under the standard normal curve for values between 0 and the relative -score. For example, to determine the area under the curve between -scores of 0 and 2.36, look in the intersecting cell for the row labeled 2.3 and the column labeled 0.06. The area under the curve is 0.4909. To determine the area between 0 and a negative value, look in the intersecting cell of the row and column which sums to the absolute value of the number in question. For example, the area under the curve between and 0 is equal to the area under the curve between 1.3 and 0, so look at the cell that is the intersection of the 1.3 row and the 0.00 column. (The area is 0.4032.)
It is extremely important, especially when you first start with these calculations, that you get in the habit of relating it to the normal distribution by drawing a sketch of the situation. In this case, simply draw a sketch of a standard normal curve with the appropriate region shaded and labeled.
Using the Table
Find the probability of choosing a value that is greater than . Before even using the table, first draw a sketch and estimate the probability. This -score is just below the mean, so the answer should be more than 0.5.
Next, read the table to find the correct probability for the data below this -score. We must first round this -score to , so this will slightly under-estimate the probability, but it is the best we can do using the table. The table returns a value of as the area below this -score. Because the area under the density curve is equal to 1, we can subtract this value from 1 to find the correct probability of about 0.7019.
What about values between two -scores? While it is an interesting and worthwhile exercise to do this using a table, it is so much simpler using software or a graphing calculator.
This probability can be calculated as follows:
It can also be found using the TI-83/84 calculator. Use the 'normalcdf(, 1.30, 0, 1)' command, and the calculator will return the result 0.898538. The syntax for this command is 'normalcdf(min, max, , )'. When using this command, you do not need to first standardize. You can use the mean and standard deviation of the given distribution.
Technology Note: The 'normalcdf(' Command on the TI-83/84 Calculator
Your graphing calculator has already been programmed to calculate probabilities for a normal density curve using what is called a cumulative density function. The command you will use is found in the DISTR menu, which you can bring up by pressing [2ND][DISTR].
Press  to select the 'normalcdf(' command, which has a syntax of 'normalcdf(lower bound, upper bound, mean, standard deviation)'.
The command has been programmed so that if you do not specify a mean and standard deviation, it will default to the standard normal curve, with and .
For example, entering 'normalcdf(, 1)' will specify the area within one standard deviation of the mean, which we already know to be approximately 0.68.
Try verifying the other values from the Empirical Rule.
'Normalcdf ' gives values of the cumulative normal density function. In other words, it gives the probability of an event occurring between and , or the area under the probability density curve between the vertical lines and , where the normal distribution has a mean of and a standard deviation of . If and are not specified, it is assumed that and .
Find the probability that .
The calculator command must have both an upper and lower bound. Technically, though, the density curve does not have a lower bound, as it continues infinitely in both directions. We do know, however, that a very small percentage of the data is below 3 standard deviations to the left of the mean. Use as the lower bound and see what answer you get.
The answer is fairly accurate, but you must remember that there is really still some area under the probability density curve, even though it is just a little, that we are leaving out if we stop at . If you look at the -table, you can see that we are, in fact, leaving out about . Next, try going out to and .
Once we get to , the answer is quite accurate. Since we cannot really capture all the data, entering a sufficiently small value should be enough for any reasonable degree of accuracy. A quick and easy way to handle this is to enter (or “a bunch of nines”). It really doesn’t matter exactly how many nines you enter. The difference between five and six nines will be beyond the accuracy that even your calculator can display.
In most practical problems involving normal distributions, the curve will not be as we have seen so far, with and . When using a -table, you will first have to standardize the distribution by calculating the -score(s).
A candy company sells small bags of candy and attempts to keep the number of pieces in each bag the same, though small differences due to random variation in the packaging process lead to different amounts in individual packages. A quality control expert from the company has determined that the mean number of pieces in each bag is normally distributed, with a mean of 57.3 and a standard deviation of 1.2. Endy opened a bag of candy and felt he was cheated. His bag contained only 55 candies. Does Endy have reason to complain?
To determine if Endy was cheated, first calculate the -score for 55:
Using a table, the probability of experiencing a value this low is approximately . In other words, there is about a 3% chance that you would get a bag of candy with 55 or fewer pieces, so Endy should feel cheated.
Using a graphing calculator, the results would look as follows (the 'Ans' function has been used to avoid rounding off the -score):
However, one of the advantages of using a calculator is that it is unnecessary to standardize. We can simply enter the mean and standard deviation from the original population distribution of candy, avoiding the -score calculation completely.
Find the probability for .
Right away, we are at an advantage using the calculator, because we do not have to round off the -score. Enter the 'normalcdf(' command, using to “a bunch of nines.” The nines represent a ridiculously large upper bound that will insure that the unaccounted-for probability will be so small that it will be virtually undetectable.
Remember that because of rounding, our answer from the table was slightly too small, so when we subtracted it from 1, our final answer was slightly too large. The calculator answer of about 0.70125 is a more accurate approximation than the answer arrived at by using the table.
- Estimate the standard deviation of the following distribution.
- Calculate the probabilities using only the z-table. Show all your work.
- Show all work.
- Brielle’s statistics class took a quiz, and the results were normally distributed, with a mean of 85 and a standard deviation of 7. She wanted to calculate the percentage of the class that got a (between 80 and 90). She used her calculator and was puzzled by the result. Here is a screen shot of her calculator:
Explain her mistake and the resulting answer on the calculator, and then calculate the correct answer.
- Which grade is better: A 78 on a test whose mean is 72 and standard deviation is 6.5, or an 83 on a test whose mean is 77 and standard deviation is 8.4. Justify your answer and draw sketches of each distribution.
- Teachers A and B have final exam scores that are approximately normally distributed, with the mean for Teacher A equal to 72 and the mean for Teacher B equal to 82. The standard deviation of Teacher A’s scores is 10, and the standard deviation of Teacher B’s scores is 5.
- With which teacher is a score of 90 more impressive? Support your answer with appropriate probability calculations and with a sketch.
- With which teacher is a score of 60 more discouraging? Again, support your answer with appropriate probability calculations and with a sketch.
- How do we calculate areas/probabilities for distributions that are not normal?
- How do we calculate -scores, means, standard deviations, or actual values given a probability or area?
- For each of the (critical value for the standard normal), Find
- For a normal distribution, approximately what z score would correspond to a data value equaling
- The median
- The mean
- Find the value that satisfies each of the following probabilities for a standard normal random variable.
- Find the mean and the standard deviation of a normally distributed random variable , if and .
To view the Review answers, open this PDF file and look for section 5.2.