# 5.2: The Density Curve of the Normal Distribution

## Learning Objectives

- Identify the properties of a normal density curve, and the relationship between concavity and standard deviation.
- Convert between scores and areas under a normal probability curve.
- Calculate probabilities that correspond to left, right, and middle areas from a left-tail score table.
- Calculate probabilities that correspond to left, right, and middle areas using a graphing calculator.

## Introduction

In this section we will continue our investigation of normal distributions to include density curves and learn various methods for calculating probabilities from the normal density curve.

## Density Curves

A **density curve** is an idealized representation of a distribution in which the area under the curve is defined to be . Density curves need not be normal, but the **normal density curve** will be the most useful to us.

## Inflection Points on a Normal Density Curve

We already know from the empirical rule, that approximately of the data in a normal distribution lies within standard deviation of the mean. In a density curve, this means that about of the total area under the curve is within scores of . Look at the following three density curves:

Notice that the curves are spread increasingly wider. Lines have been drawn to show the points one standard deviation on either side of the mean. Look at *where* this happens on each density curve. Here is a normal distribution with an even larger standard deviation.

Could you predict the standard deviation of this distribution from estimating the point on the density curve?

You may notice that the density curve changes shape at this point in each of our examples. In Calculus, we learn to call this shape changing location an **inflection point**. It is the point where the curve changes **concavity**. Starting from the mean and heading outward to the left and right, the curve is concave down (it looks like a mountain, or shape). After passing this point, the curve is concave up (it looks like a valley or shape). We will leave it to the Calculus students to prove it, but in a normal density curve, this inflection point is always exactly one standard deviation away from the mean.

In this example, the standard deviation was . We can use these concepts to estimate the standard deviation of a normally distributed data set.

Can you estimate the standard deviation of the distribution represented by the following histogram?

This distribution is fairly normal, so we could draw a density curve to approximate it as follows.

Now estimate the inflection points:

It appears that the mean is about and the inflection points are and respectively. This would lead to an estimate of about for the standard deviation.

The actual statistics for this distribution are:

We can verify this using expectations from the empirical rule. In the following graph, we have highlighted the bins that are contained within one standard deviation of the mean.

If you estimate the relative frequencies from each bin, they total remarkably close to

## Calculating Density Curve Areas

While it is convenient to estimate areas using the empirical rule, we need more precise methods to calculate the areas for other values. In Calculus you study methods for calculating the area under a curve, but in statistics, we are not so concerned about the specific method used to calculate these areas. We will use formulas or technology to do the calculations for us.

### Z-Tables

Before software and graphing calculator technology was readily available, it was common to use tables to approximate the amount of area under a normal density curve between any two given scores. We have included two commonly used tables at the end of this lesson. Here are a few things you should know about reading these tables:

The values in these tables are all in terms of scores, or **standardized**, meaning that they correspond to a standard normal curve in which the mean is and the standard deviation is . It is important to understand that the table shows the areas **below** the given score in the table. It is possible and often necessary to calculate the area **above**, or **between** scores as well. You could generate new tables to show these values, but it is just as easy to calculate them from the one table.

The values in these tables can represent areas under the density curve. For example, means half of the area (because the area of the total density curve is ). However, they are most frequently expressed as probabilities, e.g. means the probability of a randomly chosen value from this distribution being in that region is , or a chance.

scores must be rounded to the nearest hundredth to use the table.

Most score tables do not go much beyond standard deviations away from the mean in either direction because as you know, the probability of experiencing results that extreme in a normal distribution is very low.

Table 5.5 shows those below the mean and Table 5.6 shows values of scores that are to the right of the mean. To help you understand how to read the table, look at the top left entry of Table 5.6. It reads .

Think of the table as a stem and leaf plot with the stem of the scores running down the left side of the table and the leaves across the top. The leaves represent of a score. So, this value represents a score of . This should make sense because we are talking about the actual mean.

Let’s look at another common value. In Table 5.6 find the score of and read the associated probability value.

As we have already discovered, approximately of the data is below this value ( in the middle, and in the tail). This corresponds to the probability in the table of .

Now find the probability for a score of . It is often a good idea to estimate this value before using the table when you are first getting started. This score is between and . We know from the empirical rule that the probability for is approximately and similarly, for it is around , so we should expect to get a value somewhere between these two estimates.

Locate the stem and the leaf for on Table 5.5 and follow them across and down to the corresponding probability. The answer appears to be approximately , or approximately of the data in a standard normal curve is below a score of .

It is extremely important, especially when you first start with these calculations, that you get in the habit of relating it to the normal distribution by drawing a sketch of the situation. In this case, simply draw a sketch of a standard normal curve with the appropriate region shaded and labeled.

Let’s try an example in which we want to find the probability of choosing a value that is **greater than** . Before even using the table, draw a sketch and estimate the probability. This score is just below the mean, so the answer should be more than . The score of would be half way between and , but because there is more area concentrated around the mean, we could guess that there should be more than half of the of the area in this section. If we were to guess about , we would estimate an answer of between and .

First read the table to find the correct probability for the data **below** this score. We must first round this score to . This will slightly under-estimate the probability, but it is the best we can do using the table. The table returns a value of as the area below this score. Because the area under the density curve is equal to , we can subtract this value from to find the correct probability of about .

What about values *between* two scores? While it is an interesting and worthwhile exercise to do this using a table, it is so much simpler using software or a graphing calculator that we will leave this for one of the homework exercises.

### Using Graphing Calculators: The Normal CDF Command.

Your graphing calculator has already been programmed to calculate probabilities for a normal density curve using what is called a **cumulative density function** or **cdf**. This is found in the distributions menu above the VARS key.

Press **[2nd]** **[VARS]**, **[2]** to select the **normalcdf** (command. **normalcdf( lower bound, upper bound, mean, standard deviation**)

The command has been programmed so that if you do not specify a mean and standard deviation, it will default to the standard normal curve with and .

For example, entering **normalcdf** will specify the area within one standard deviation of the mean, which we already know to be approximately .

Try to verify the other values from the empirical rule.

**Summary:**

**Normalpdf** gives values of the **probability density function**. It gives the value of the probability (vertical distance to the graph) at any value of . This is the function we graphed in Lesson 5.1

**Normalcdf** gives values of the **cumulative density function**. It gives the probability of an event occurring between and (area under the probability density function curve and between two vertical lines).

Let’s look at the two examples we did in the last section using the table.

**Example:**

Find the probability for .

**Solution:**

The calculator command must have both an upper and lower bound. Technically though, the density curve does not have a lower bound as it continues infinitely in both directions. We do know however, that a very small percentage of the data is below standard deviations to the left of the mean. Use as the lower bound and see what answer you get.

The answer is accurate to the nearest , but remember that there really still is some data, no matter how little, that we are leaving out if we stop at . In fact, if you look at Table 1, you will see that about has been left out. Try going out to and .

Notice that if we use , the answer is as accurate as the one in the table. Since we cannot really capture “all” the data, entering a sufficiently small value should be enough for any reasonable degree of accuracy. A quick and easy way to handle this is to enter (or “a bunch of nines”). It really doesn’t matter exactly how many nines you enter. The difference between five and six nines will be beyond the accuracy that even your calculator can display.

**Example:**

Find the probability for .

**Solution:**

Right away we are at an advantage using the calculator because we do not have to round off the score. Enter a **normalcdf** command from to “bunches of nines”. This upper bound represents a ridiculously large upper bound that would insure a probability of missing data being so small that it is virtually undetectable.

Remember that our answer from the table was slightly too small, so when we subtracted it from , it became too large. The calculator answer of about is a more accurate approximation than the table value.

### Standardizing

In most practical problems involving normal distributions, the curve will not be standardized ( and ). When using a table, you will have to first standardize the distribution by calculating the score(s).

**Example:**

A candy company sells small bags of candy and attempts to keep the number of pieces in each bag the same, though small differences due to random variation in the packaging process lead to different amounts in individual packages. A quality control expert from the company has determined that the mean number of pieces in each bag is normally distributed with a mean of and a standard deviation of . Endy opened a bag of candy and felt he was cheated. His bag contained only candies. Does Endy have reason to complain?

**Solution:**

Calculate the score for .

Using Table 5.5, the probability of experiencing a value this low is approximately . In other words, there is about a chance that you would get a bag of candy with or fewer pieces, so Endy should feel cheated.

Using the graphing calculator, the results would look as follows (the ANS function has been used to avoid rounding off the score):

However, the advantage of using the calculator is that it is unnecessary to standardize. We can simply enter the mean and standard deviation from the original population distribution of candy, avoiding the score calculation completely.

## Lesson Summary

A **density curve** is an idealized representation of a distribution in which the area under the curve is defined as , or in terms of percentages, of the data. A **normal density curve** is simply a density curve for a normal distribution. Normal density curves have two **inflection points**, which are the points on the curve where it changes concavity. Remarkably, these points correspond to the points in the normal distribution that are exactly standard deviation away from the mean. Applying the empirical rule tells us that the area under the normal density curve between these two points is approximately . This is most commonly thought of in terms of probability, e.g. the probability of choosing a value at random from this distribution and having it be within standard deviation of the mean is . Calculating other areas under the curve can be done using a **table** or using the **normalcdf** command on the TI-83/84. The table provides the area less than a particular score for the standard normal density curve. The calculator command allows you to specify two values, either standardized or not, and will calculate the area between those values.

## Points To consider

- How do we calculate the areas/probabilities for distributions that are not normal?
- How do we calculate the scores, mean, standard deviation, or actual value given the probability or area?

## Tables

There are two tables here, Table 1 for scores less than and one and Table 2 for scores greater than . The table entry for is the probability of lying below . Essentially, these tables list the area of the shaded region in the figure below for each value of .

For example, to look up in the first table, find in the left hand column, then read across that row until you reach the value in the hundredths place to read off the value.

Using this same technique and the second table, you should find that .

</ |