# 8.2: Testing a Proportion Hypothesis

**At Grade**Created by: CK-12

## Learning Objectives

- Test a hypothesis about a population proportion by applying the binomial distribution approximation.
- Test a hypothesis about a population proportion using the \begin{align*}P\end{align*}-value.
- Test a hypothesis about a population proportion using confidence intervals.

## Introduction

For most hypotheses that we will study, we use the general formula for calculating the test statistic:

\begin{align*}\text{Test Statistic} = \frac {\text{Observed Sample Mean-Population Mean}}{\text{Standard Error}}\end{align*}

or, the familiar

\begin{align*}z = \frac{\bar {X} - \mu_0} {\sigma_x}\end{align*}

This formula helps us determine the magnitude of the difference between the observed sample mean and the hypothesized population mean. However, many times in statistics we study or make inferences about **proportions** of a population. For example, when we look at election results we often look at the proportion of people that vote and who this proportion of voters will choose. Typically, we call these proportions **percentages** and we would say something like “Approximately \begin{align*}68\;\mathrm{percent}\end{align*} of the population voted in this election and \begin{align*}48\;\mathrm{percent}\end{align*} of these voters voted for Barak Obama.”

So how do we test hypotheses about proportions? We use the same process as we did when testing hypotheses about populations but we must include **sample proportions** as part of the analysis. This lesson will address how we investigate hypotheses around population proportions and how to construct confidence intervals around our results.

## Hypothesis Testing about Population Proportions by Applying the Binomial Distribution Approximation

As mentioned, we perform hypothesis tests about population proportions (often called percentages) quite often. We could perform these tests in the following examples:

- What percentage of graduating seniors will attend a \begin{align*}4-\end{align*}year college?
- What proportion of voters will vote for John McCain?
- What percentage of people will choose Diet Pepsi over Diet Coke?

To test questions like these, we make hypotheses about population proportions. For example,

- \begin{align*}H_{0}\end{align*}: \begin{align*}35\;\mathrm{percent}\end{align*} of graduating seniors will attend a \begin{align*}4-\end{align*}year college.
- \begin{align*}H_{0}\end{align*}: \begin{align*}42\;\mathrm{percent}\end{align*} of voters will vote for John McCain.
- \begin{align*}H_{0}\end{align*}: \begin{align*}26\;\mathrm{percent}\end{align*} of people will choose Diet Pepsi over Diet Coke.

While we can use similar methods to test these hypotheses, we do need to take several different factors into account. Because it is impractical to measure every member of the population, we follow a series of steps:

- Hypothesize a value for the population proportion \begin{align*}(P)\end{align*} like we did above.
- Randomly select a sample.
- Use the sample proportion \begin{align*}(p)\end{align*} to test the stated hypothesis.

Essentially, the sampling distribution of this sample proportion is used the same way that we use the sample mean distribution. So how do we account for the different sampling distribution of \begin{align*}p\end{align*}? We use the **binomial distribution** which illustrates situations in which two outcomes are possible (for example, voted for a candidate, didn’t vote for a candidate). However, we should remember that when the sample size is relatively large, we can use the normal distribution to approximate the binomial distribution.

In order to calculate the standard deviation of the sample distribution, we need to calculate something called the **standard error of the proportion** which is defined as:

\begin{align*}{s_p} = \sqrt {\frac {PQ}{n}}\end{align*}

where:

\begin{align*}P =\end{align*} the hypothesized value of the proportion

\begin{align*}Q =\end{align*} proportion *not* possessing the characteristic

\begin{align*}n =\end{align*} sample size

Let’s take a look at an example on how we would calculate the standard error of the proportion.

**Example:**

We want to test a hypothesis that \begin{align*}60\;\mathrm{percent}\end{align*} of the \begin{align*}400\end{align*} seniors graduating from a certain California high school will enroll in a two or four-year college upon graduation. What would be our hypotheses and the standard error of the proportion?

**Solution:**

Since we want to test the proportion of graduating seniors and we think that proportion is around \begin{align*}60\;\mathrm{percent}\end{align*}, our hypotheses are:

\begin{align*}H_{0}: P = 0.60\\ H_{a}: P \neq 0.60\end{align*}

And the standard error would be:

\begin{align*}{s_p} = \sqrt {\frac {PQ}{n}} = \sqrt {\frac {0.60 \times 0.40}{400}} = 0.0245\end{align*}

Therefore, the sampling distribution of \begin{align*}p\end{align*} for this example has a mean equal to \begin{align*}0.60\end{align*} (the hypothesized value of \begin{align*}P\end{align*}) and a standard deviation of \begin{align*}0.0245.\end{align*} With this information, we can easily evaluate hypotheses using a standard formula.

## Testing a Proportion Hypothesis Using the P-Value

Similar to testing hypotheses dealing with population means, we use a similar set of steps when testing proportion hypotheses.

- Determine and state the null and alternative hypotheses.
- Set the criterion for rejecting the null hypothesis.
- Calculate the test statistic.
- Interpret the results and decide whether to reject or fail to reject the null hypothesis.

To test a proportion hypothesis, we use the formula for calculating the test statistic for a mean, but modify it accordingly. Therefore, our formula for the test statistic of a proportion hypothesis is:

\begin{align*}z = \frac {p-P}{s_p}\end{align*}

where:

\begin{align*}p=\end{align*} the sample proportion

\begin{align*}P=\end{align*} the hypothesized population proportion

\begin{align*}s_p=\end{align*} the standard error of the proportion

**Example:**

A congressman is trying to decide on whether to vote for a bill that would legalize gay marriage. He will decide to vote for the bill only if \begin{align*}70\;\mathrm{percent}\end{align*} of his constituents favor the bill. In a survey of \begin{align*}300\end{align*} randomly selected voters, \begin{align*}224 (74.6 \%)\end{align*} indicated that they would favor the bill. Should he vote for the bill or not?

**Solution:**

First, we develop our null and alternative hypotheses.

\begin{align*}H_{0}: P = 0.70\\ H_{a}: P > 0.70\end{align*}

Next, we should set the criterion for rejecting the null hypothesis. We will use a probability (?) level of \begin{align*}0.05\end{align*} and since we are interested only in the probability that the percentage of constituents is *greater* than \begin{align*}0.70\end{align*}, we will use a single-tailed test. Looking at the standard \begin{align*}z\end{align*}-table, we find that the **critical value** for a single-tailed test at an alpha level of \begin{align*}0.05\end{align*} is equal to \begin{align*}1.64.\end{align*}

To calculate the test statistic, we first find the standard error of the proportion.

\begin{align*}S_p = \sqrt{\frac{PQ} {n}} = \sqrt{\frac{0.70 \times 0.30} {300}} \approx 0.0265\end{align*}

After finding the standard error, we can calculate the standard \begin{align*}z\end{align*}-score needed to evaluate our hypothesis.

\begin{align*}z = \frac{p - P} {s_p} = \frac{0.74 - 0.70} {0.0265} \approx 1.51\end{align*}

Since our critical value is \begin{align*}1.64\end{align*} and our test statistic is \begin{align*}1.51\end{align*}, we *cannot reject the null hypothesis.* This means that we cannot conclude that the population proportion is greater than \begin{align*}0.70\end{align*} with \begin{align*}95\;\mathrm{percent}\end{align*} certainty. Given this information, it is not safe to conclude that at least \begin{align*}70\;\mathrm{percent}\end{align*} of the voters would favor this bill with any degree of certainty. Even though the proportion of voters supporting the bill is over \begin{align*}70\;\mathrm{percent}\end{align*}, this could be due to chance and is not statistically significant.

**Example:**

Admission staff from a local university is conducting a survey to determine the proportion of incoming freshman that will need financial aid. A survey on housing needs, financial aid and academic interests is collected from \begin{align*}400\end{align*} of the incoming freshman. Staff hypothesized that \begin{align*}30\;\mathrm{percent}\end{align*} of freshman will need financial aid and the sample from the survey indicated that \begin{align*}101\ (25.3 \%)\end{align*} would need financial aid. Is this an accurate guess?

**Solution:**

First, we develop our null and alternative hypotheses.

\begin{align*}H_{0}: P = 0.30\\ H_{a}: P \neq 0.30\end{align*}

Next, we should set the criterion for rejecting the null hypothesis. The \begin{align*}0.05\end{align*} alpha level is used and for an ? \begin{align*}= 0.05\end{align*} the critical values of the test statistic are \begin{align*}1.96\end{align*} standard deviations above or below the mean.

To calculate the test statistic, we first find the standard error of the proportion.

\begin{align*}S_p = \sqrt{\frac{PQ} {n}} = \sqrt{\frac{0.30 \times 0.70} {400}} \approx 0.0229\end{align*}

After finding the standard error, we can calculate the standard \begin{align*}z\end{align*}-score needed to evaluate our hypothesis.

\begin{align*}Z = \frac{p - P} {s_p} = \frac{0.25 - 0.30} {0.0229} \approx -2.18\end{align*}

Since our critical value is \begin{align*}1.96\end{align*} and our test statistic is \begin{align*}-2.18\end{align*}, we *can reject the null hypothesis.* This means that we can conclude that the population of freshman needing financial aid is significantly more or less than \begin{align*}30\;\mathrm{percent.}\end{align*} Since the test statistic is negative, we can conclude with \begin{align*}95 \%\end{align*} certainty that in the population of incoming freshman, less than \begin{align*}30\;\mathrm{percent}\end{align*} of the students will need financial aid.

## Confidence Intervals for Hypotheses about Population Proportions

When making a decision, we like to be able to determine how confident we are about a decision. For example, when a congressman is deciding whether or not to vote for a bill, he would like to be able to say something to the effect of “I am \begin{align*}99 \%\end{align*} confident that \begin{align*}70\;\mathrm{percent}\end{align*} of my constituents will support this decision.” With statistical analysis, we can construct something called the **confidence interval** that specifies the level of confidence that we have in our results.

The confidence interval is a range of values that we are confident, but not certain, contains the population parameter that we are studying (most often this parameter is the mean).

We interpret the results of the confidence intervals by calculating:

- The level of confidence (i.e. \begin{align*}- 95 \%, 99 \%,\end{align*} etc.)
- The interval (i.e. \begin{align*}- 40.4\end{align*} to \begin{align*}45.6\end{align*} or \begin{align*}102\end{align*} to \begin{align*}108\end{align*}, etc.)

If we are estimating the confidence interval for a population mean, then we use the sample mean for the statistic. However, if we are estimating for a population proportion, we use the sample population proportion.

The confidence interval always includes the population parameter. Therefore, when we construct a confidence interval we can conclude that that interval also contains the sample statistic. A confidence interval statement would look something like:

- We are \begin{align*}95\;\mathrm{percent}\end{align*} confident that the interval from \begin{align*}34.2\end{align*} to \begin{align*}39.1\end{align*} contains the mean

- \begin{align*}(2.10, < \mu < 2.90)\end{align*} – We are \begin{align*}90\;\mathrm{percent}\end{align*} confident that this interval contains the population proportion

We can *not* say that the probability is \begin{align*}95\;\mathrm{percent}\end{align*} that the interval contains the mean since either the interval contains the mean or it does not. Therefore, when we talk of our confidence level we say that we ‘\begin{align*}X \%\end{align*} certain” that the specific interval contains the mean.

**Example:**

In our example about the congressman voting for the bill on gay marriage, the congressman decides that he wants an estimate of the proportion of voters in the population that are likely to vote for a bill. Construct a confidence interval for this population proportion.

**Solution:**

As a reminder, our sample proportion was \begin{align*}0.746\end{align*} and our standard error of the proportion was \begin{align*}0.0265.\end{align*} To correspond with the \begin{align*}? = .05,\end{align*} we will construct a \begin{align*}95 \%\end{align*} confidence interval for the population proportion. Under the normal curve, \begin{align*}95 \%\end{align*} of the area is between \begin{align*}z = -1.96\end{align*} and \begin{align*}z = +1.96.\end{align*} The confidence interval for this proportion would be:

\begin{align*}&CI_{95}: \\ &p \pm 1.96 \text{(standard error)}\\ &0.746 \pm (1.96) (0.0265)\end{align*}

So \begin{align*}0.694 < p < 0.798\end{align*}

With respect to the population proportion, we are \begin{align*}95 \%\end{align*} confident that the interval from \begin{align*}0.69\end{align*} to \begin{align*}.077\end{align*} contains the population proportion. This means that we are \begin{align*}95 \%\end{align*} confident that the average proportion of voters who will support the bill is between \begin{align*}69\end{align*} and \begin{align*}77 \%\end{align*}.

## Lesson Summary

1. In statistics, we also make inferences about proportions of a population. We use the same process as in testing hypotheses about populations but we must include hypotheses about proportions and the proportions of the sample in the analysis.

2. To calculate the test statistic needed to evaluate the population proportion hypothesis, we must also calculate the standard error of the proportion which is defined as \begin{align*}s_p = \sqrt{\frac{PQ} {n}}\end{align*}

3. The formula for calculating the test statistic for a population proportion is

\begin{align*}z = \frac{p - P} {s_p} \end{align*}

where:

\begin{align*}p =\end{align*} the sample proportion

\begin{align*}P =\end{align*} the hypothesized population proportion

\begin{align*}s_p =\end{align*} the standard error of the proportion

4. We can construct something called the confidence interval that specifies the level of confidence that we have in our results. The confidence interval is a range of values that we are confident, but not certain, contains the population parameter that we are studying.

## Review Questions

- The test statistic helps us determine ___.
- True or false: In statistics, we are able to study and make inferences about proportions, or percentages, of a population.
- True or false: A confidence interval states the probability that the interval contains the mean. For example, a confidence interval of \begin{align*}95 \%\end{align*} would say that “This interval contains the mean \begin{align*}95 \%\end{align*} of the time.”

A state senator cannot decide how to vote on an environmental protection bill. The senator decides to request her own survey and if the proportion of registered voters supporting the bill exceeds \begin{align*}0.60\end{align*}, she will vote for it. A random sample of \begin{align*}750\end{align*} voters is selected and \begin{align*}495\end{align*} are found to support the bill.

- What are the null and alternative hypotheses for this problem?
- What is the observed value of the sample proportion?
- What is the standard error of the proportion?
- What is the test statistic for this scenario?
- What decision would you make about the null hypothesis if you had an alpha level of \begin{align*}.01\end{align*}?
- The state senator decided that she is still wants an estimate of the proportion of voters in the population who are likely to vote for the bill. Construct a \begin{align*}99 \%\end{align*} confidence interval around this proportion.
- Please write a statement describing the results of the confidence interval.

## Review Answers

- The magnitude of the difference between the observed sample mean and the hypothesized population mean.
- True
- False

We *can not* say that the probability is \begin{align*}95\;\mathrm{percent}\end{align*} that the interval contains the mean since either the interval contains the mean or it does not. Therefore, when we talk of our confidence level we say that we are ‘\begin{align*}X \%\end{align*} certain” that the specific interval contains the mean.

- \begin{align*}H_{0}: P = 0.60, H_{a}: P > 0.60\end{align*}
- \begin{align*}p = 495/750 = 0.66\end{align*}
- \begin{align*}0.0179\end{align*}
- \begin{align*}z = 3.35\end{align*}
- Since the test statistic of \begin{align*}3.35\end{align*} is exceeds the critical value of \begin{align*}2.33\end{align*} (one-tailed \begin{align*}z\end{align*}-test at \begin{align*}.01\end{align*}), we reject the null hypothesis and conclude that the probability is less than \begin{align*}0.01\end{align*} that a sample proportion of \begin{align*}0.66\end{align*} would appear due to sampling error if in fact the population proportion was equal to \begin{align*}0.60.\end{align*}
- \begin{align*}CI = (0.614, 0.706)\end{align*}
- We are \begin{align*}99 \%\end{align*} confident that the interval \begin{align*}(0.614, < p < 0.706)\end{align*} contains the proportion mean. In other words, this confidence interval shows perhaps as many as \begin{align*}70\;\mathrm{percent}\end{align*} of the voters favor the bill, but it is very unlikely that less than \begin{align*}61\;\mathrm{percent}\end{align*} favor the bill.