5.3: Applications of the Normal Distribution
Learning Objective
- Apply the characteristics of a normal distribution to solving problems.
Introduction
The normal distribution is the foundation for statistical inference and will be an essential part of many of those topics in later chapters. In the meantime, this section will cover some of the types of questions that can be answered using the properties of a normal distribution. The first examples deal with more theoretical questions that will help you master basic understandings and computational skills, while the later problems will provide examples with real data, or at least a real context.
Unknown Value Problems
If you understand the relationship between the area under a density curve and mean, standard deviation, and \begin{align*}z\end{align*}-scores, you should be able to solve problems in which you are provided all but one of these values and are asked to calculate the remaining value. In the last lesson, we found the probability that a variable is within a particular range, or the area under a density curve within that range. What if you are asked to find a value that gives a particular probability?
Example: Given the normally-distributed random variable \begin{align*}X\end{align*}, with \begin{align*}\mu=35\end{align*} and \begin{align*}\sigma=7.4\end{align*}, what is the value of \begin{align*}X\end{align*} where the probability of experiencing a value less than it is 80%?
As suggested before, it is important and helpful to sketch the distribution.
If we had to estimate an actual value first, we know from the Empirical Rule that about 84% of the data is below one standard deviation to the right of the mean.
\begin{align*}\mu + 1\sigma = 35+7.4 = 42.4\end{align*}
Therefore, we expect the answer to be slightly below this value.
When we were given a value of the variable and were asked to find the percentage or probability, we used a \begin{align*}z\end{align*}-table or the 'normalcdf(' command on a graphing calculator. But how do we find a value given the percentage? Again, the table has its limitations in this case, and graphing calculators and computer software are much more convenient and accurate. The command on the TI-83/84 calculator is 'invNorm('. You may have seen it already in the DISTR menu.
The syntax for this command is as follows:
'InvNorm(percentage or probability to the left, mean, standard deviation)'
Make sure to enter the values in the correct order, such as in the example below:
Unknown Mean or Standard Deviation
Example: For a normally distributed random variable, \begin{align*}\sigma=4.5, \ x=20\end{align*}, and \begin{align*}p=0.05\end{align*}, Estimate \begin{align*}\mu\end{align*}.
To solve this problem, first draw a sketch:
Remember that about 95% of the data is within 2 standard deviations of the mean. This would leave 2.5% of the data in the lower tail, so our 5% value must be less than 9 units from the mean.
Because we do not know the mean, we have to use the standard normal curve and calculate a \begin{align*}z\end{align*}-score using the 'invNorm(' command. The result, \begin{align*}-1.645\end{align*}, confirms the prediction that the value is less than 2 standard deviations from the mean.
Now, plug in the known quantities into the \begin{align*}z\end{align*}-score formula and solve for \begin{align*}\mu\end{align*} as follows:
\begin{align*}z & = \frac{x-\mu}{\sigma}\\ -1.645 & \approx \frac{20-\mu}{4.5}\\ (-1.645)(4.5) & \approx 20-\mu\\ -7.402-20 & \approx -\mu\\ -27.402 & \approx -\mu\\ \mu & \approx 27.402\end{align*}
Example: For a normally-distributed random variable, \begin{align*}\mu=83, \ x=94\end{align*}, and \begin{align*}p=0.90\end{align*}. Find \begin{align*}\sigma\end{align*}.
Again, let’s first look at a sketch of the distribution.
Since about 97.5% of the data is below 2 standard deviations, it seems reasonable to estimate that the \begin{align*}x\end{align*} value is less than two standard deviations away from the mean and that \begin{align*}\sigma\end{align*} might be around 7 or 8.
Again, the first step to see if our prediction is right is to use 'invNorm(' to calculate the \begin{align*}z\end{align*}-score. Remember that since we are not entering a mean or standard deviation, the result is based on the assumption that \begin{align*}\mu=0\end{align*} and \begin{align*}\sigma=1\end{align*}.
Now, use the \begin{align*}z\end{align*}-score formula and solve for \begin{align*}\sigma\end{align*} as follows:
\begin{align*}z & = \frac{x-\mu}{\sigma}\\ 1.282 & \approx \frac{94-83}{\sigma}\\ \sigma & \approx \frac{11}{1.282}\\ \sigma & \approx 8.583\end{align*}
Technology Note: Drawing a Distribution on the TI-83/84 Calculator
The TI-83/84 calculator will draw a distribution for you, but before doing so, we need to set an appropriate window (see screen below) and delete or turn off any functions or plots. Let’s use the last example and draw the shaded region below 94 under a normal curve with \begin{align*}\mu=83\end{align*} and \begin{align*}\sigma=8.583\end{align*}. Remember from the Empirical Rule that we probably want to show about 3 standard deviations away from 83 in either direction. If we use 9 as an estimate for \begin{align*}\sigma\end{align*}, then we should open our window 27 units above and below 83. The \begin{align*}y\end{align*} settings can be a bit tricky, but with a little practice, you will get used to determining the maximum percentage of area near the mean.
The reason that we went below the \begin{align*}x\end{align*}-axis is to leave room for the text, as you will see.
Now, press [2ND][DISTR] and arrow over to the DRAW menu.
Choose the 'ShadeNorm(' command. With this command, you enter the values just as if you were doing a 'normalcdf(' calculation. The syntax for the 'ShadeNorm(' command is as follows:
'ShadeNorm(lower bound, upper bound, mean, standard deviation)'
Enter the values shown in the following screenshot:
Next, press [ENTER] to see the result. It should appear as follows:
Technology Note: The 'normalpdf(' Command on the TI-83/84 Calculator
You may have noticed that the first option in the DISTR menu is 'normalpdf(', which stands for a normal probability density function. It is the option you used in lesson 5.1 to draw the graph of a normal distribution. Many students wonder what this function is for and occasionally even use it by mistake to calculate what they think are cumulative probabilities, but this function is actually the mathematical formula for drawing a normal distribution. You can find this formula in the resources at the end of the lesson if you are interested. The numbers this function returns are not really useful to us statistically. The primary purpose for this function is to draw the normal curve.
To do this, first be sure to turn off any plots and clear out any functions. Then press [Y=], insert 'normalpdf(', enter 'X', and close the parentheses as shown. Because we did not specify a mean and standard deviation, the standard normal curve will be drawn. Finally, enter the following window settings, which are necessary to fit most of the curve on the screen (think about the Empirical Rule when deciding on settings), and press [GRAPH]. The normal curve below should appear on your screen.
Normal Distributions with Real Data
The foundation of performing experiments by collecting surveys and samples is most often based on the normal distribution, as you will learn in greater detail in later chapters. Here are two examples to get you started.
Example: The Information Centre of the National Health Service in Britain collects and publishes a great deal of information and statistics on health issues affecting the population. One such comprehensive data set tracks information about the health of children\begin{align*}^1\end{align*}. According to its statistics, in 2006, the mean height of 12-year-old boys was 152.9 cm, with a standard deviation estimate of approximately 8.5 cm. (These are not the exact figures for the population, and in later chapters, we will learn how they are calculated and how accurate they may be, but for now, we will assume that they are a reasonable estimate of the true parameters.)
If 12-year-old Cecil is 158 cm, approximately what percentage of all 12-year-old boys in Britain is he taller than?
We first must assume that the height of 12-year-old boys in Britain is normally distributed, and this seems like a reasonable assumption to make. As always, draw a sketch and estimate a reasonable answer prior to calculating the percentage. In this case, let’s use the calculator to sketch the distribution and the shading. First decide on an appropriate window that includes about 3 standard deviations on either side of the mean. In this case, 3 standard deviations is about 25.5 cm, so add and subtract this value to/from the mean to find the horizontal extremes. Then enter the appropriate 'ShadeNorm(' command as shown:
From this data, we would estimate that Cecil is taller than about 73% of 12-year-old boys. We could also phrase our assumption this way: the probability of a randomly selected British 12-year-old boy being shorter than Cecil is about 0.73. Often with data like this, we use percentiles. We would say that Cecil is in the \begin{align*}73^{\text{rd}}\end{align*} percentile for height among 12-year-old boys in Britain.
How tall would Cecil need to be in order to be in the top 1% of all 12-year-old boys in Britain?
Here is a sketch:
In this case, we are given the percentage, so we need to use the 'invNorm(' command as shown.
Our results indicate that Cecil would need to be about 173 cm tall to be in the top 1% of 12-year-old boys in Britain.
Example: Suppose that the distribution of the masses of female marine iguanas in Puerto Villamil in the Galapagos Islands is approximately normal, with a mean mass of 950 g and a standard deviation of 325 g. There are very few young marine iguanas in the populated areas of the islands, because feral cats tend to kill them. How rare is it that we would find a female marine iguana with a mass less than 400 g in this area?
Using a graphing calculator, we can approximate the probability of a female marine iguana being less than 400 grams as follows:
With a probability of approximately 0.045, or only about 5%, we could say it is rather unlikely that we would find an iguana this small.
Lesson Summary
In order to find the percentage of data in-between two values (or the probability of a randomly chosen value being between those values) in a normal distribution, we can use the 'normalcdf(' command on the TI-83/84 calculator. When you know the percentage or probability, use the 'invNorm(' command to find a \begin{align*}z\end{align*}-score or value of the variable. In order to use these tools in real situations, we need to know that the distribution of the variable in question is approximately normal. When solving problems using normal probabilities, it helps to draw a sketch of the distribution and shade the appropriate region.
Point to Consider
- How do the probabilities of a standard normal curve apply to making decisions about unknown parameters for a population given a sample?
Multimedia Links
For an example of finding the probability between values in a normal distribution (4.0)(7.0), see EducatorVids, Statistics: Applications of the Normal Distribution (1:45).
For an example showing how to find the mean and standard deviation of a normal distribution (8.0), see ExamSolutions, Normal Distribution: Finding the Mean and Standard Deviation (6:01).
For the continuation of finding the mean and standard deviation of a normal distribution (8.0), see ExamSolutions, Normal Distribution: Finding the Mean and Standard Deviation (Part 2) (8:09).
Review Questions
- Which of the following intervals contains the middle 95% of the data in a standard normal distribution?
- \begin{align*}z<2\end{align*}
- \begin{align*}z \le 1.645\end{align*}
- \begin{align*}z \le 1.96\end{align*}
- \begin{align*}-1.645 \le z \le 1.645\end{align*}
- \begin{align*}-1.96 \le z \le 1.96\end{align*}
- For each of the following problems, \begin{align*}X\end{align*} is a continuous random variable with a normal distribution and the given mean and standard deviation. \begin{align*}P\end{align*} is the probability of a value of the distribution being less than \begin{align*}x\end{align*}. Find the missing value and sketch and shade the distribution. \begin{align*}& \text{mean} && \text{Standard deviation} && x && P\\ & 85 && 4.5 && && 0.68\\ & \text{mean} && \text{Standard deviation} && x && P\\ & && 1 && 16 && 0.05\\ & \text{mean} && \text{Standard deviation} && x && P\\ & 73 && && 85 && 0.91\\ & \text{mean} && \text{Standard deviation} && x && P\\ & 93 && 5 && && 0.90\end{align*}
- What is the \begin{align*}z\end{align*}-score for the lower quartile in a standard normal distribution?
- The manufacturing process at a metal-parts factory produces some slight variation in the diameter of metal ball bearings. The quality control experts claim that the bearings produced have a mean diameter of 1.4 cm. If the diameter is more than 0.0035 cm too wide or too narrow, they will not work properly. In order to maintain its reliable reputation, the company wishes to insure that no more than one-tenth of 1% of the bearings that are made are ineffective. What would the standard deviation of the manufactured bearings need to be in order to meet this goal?
- Suppose that the wrapper of a certain candy bar lists its weight as 2.13 ounces. Naturally, the weights of individual bars vary somewhat. Suppose that the weights of these candy bars vary according to a normal distribution, with \begin{align*}\mu=2.2\end{align*} ounces and \begin{align*}\sigma=0.04\end{align*} ounces.
- What proportion of the candy bars weigh less than the advertised weight?
- What proportion of the candy bars weight between 2.2 and 2.3 ounces?
- A candy bar of what weight would be heavier than all but 1% of the candy bars out there?
- If the manufacturer wants to adjust the production process so that no more than 1 candy bar in 1000 weighs less than the advertised weight, what would the mean of the actual weights need to be? (Assume the standard deviation remains the same.)
- If the manufacturer wants to adjust the production process so that the mean remains at 2.2 ounces and no more than 1 candy bar in 1000 weighs less than the advertised weight, how small does the standard deviation of the weights need to be?
References
http://www.ic.nhs.uk/default.asp?sID=1198755531686
http://www.nytimes.com/2008/04/04/us/04poll.html
On the Web
http://davidmlane.com/hyperstat/A25726.html Contains the formula for the normal probability density function.
http://www.willamette.edu/~mjaneba/help/normalcurve.html Contains background on the normal distribution, including a picture of Carl Friedrich Gauss, a German mathematician who first used the function.
http://en.wikipedia.org/wiki/Normal_distribution Is highly mathematical.
Keywords
- Concave down
- Starting from the mean and heading outward to the left and right, the curve is concave down.
- Concave up
- After passing these points, the curve is concave up.
- Cumulative density function
- to calculate probabilities for a normal density curve using what is called a cumulative density function.
- Density curve
- A curve where the area under the curve equals exactly one.
- Empirical Rule
- States what percentages of data in a normal distribution lies within 1, 2, and 3 standard deviations of the mean.
- Inflection Points
- A point where the curve changes concavity (from concave up to concave down, or concave down to concave up).
- Normal distribution
- A continuous probability distribution that has a symmetric bell-shaped curve with a single peak.
- Normal probability plot
- A normal probability plot can also be used to determine normality.
- Normal quantile plot
- If we calculate the \begin{align*}z-\end{align*}scores for a data set and plot them against the actual values, we have what is called a normal probability plot, or a normal quantile plot. If the data set is normal, then this plot will be perfectly linear.
- Probability density function
- DISTR menu is 'normalpdf(', which stands for a normal probability density function.
- Standard normal curve
- We have to use the standard normal curve and calculate a \begin{align*}z-\end{align*}score using the 'invNorm(' command.
- Standard normal distribution
- A normal distribution with \begin{align*}\mu = 0\end{align*} and \begin{align*}\sigma = 1\end{align*}.
- Standardize
- the curve will not be as we have seen so far, with \begin{align*}\mu = 0\end{align*} and \begin{align*}\sigma = 1\end{align*}. When using a \begin{align*}z-\end{align*}table, you will first have to standardize the distribution by calculating the \begin{align*}z-\end{align*}score(s).
- \begin{align*}z-\end{align*}score
- A measure of the number of standard deviations a particular data point is away from the mean.