Testing Hypotheses: P-Values
Deciding Whether to Reject the Null Hypothesis: One-Tailed and Two-Tailed Hypothesis Tests
When a hypothesis is tested, a statistician must decide on how much evidence is necessary in order to reject the null hypothesis. For example, if the null hypothesis is that the average height of a population is 64 inches a statistician wouldn't measure one person who is 66 inches and reject the hypothesis based on that one trial. It is too likely that the discrepancy was merely due to chance.
We use statistical tests to determine if the sample data give good evidence against the claim . The numerical measure that we use to determine the strength of the sample evidence we are willing to consider strong enough to reject is called the level of significance and it is denoted by . If we choose, for example, we are saying that we would get data at least as unusual as the data we have collected no more than 1% of the time when is true.
The most frequently used levels of significance are 0.05 and 0.01. If our data results in a statistic that falls within the region determined by the level of significance then we reject . The region is therefore called the critical region. When choosing the level of significance, we need to consider the consequences of rejecting or failing to reject the null hypothesis. If there is the potential for health consequences (as in the case of active ingredients in prescription medications) or great cost (as in the case of manufacturing machine parts), we should use a more ‘conservative’ critical region with levels of significance such as .005 or .001.
When determining the critical regions for a two-tailed hypothesis test, the level of significance represents the extreme areas under the normal density curve. We call this a two-tailed hypothesis test because the critical region is located in both ends of the distribution. For example, if there was a significance level of 0.95 the critical region would be the most extreme 5 percent under the curve with 2.5 percent on each tail of the distribution.
Therefore, if the mean from the sample taken from the population falls within one of these critical regions, we would conclude that there was too much of a difference between our sample mean and the hypothesized population mean and we would reject the null hypothesis. However, if the mean from the sample falls in the middle of the distribution (in between the critical regions) we would fail to reject the null hypothesis.
We calculate the critical region for the single-tail hypothesis test a bit differently. We would use a single-tail hypothesis test when the direction of the results is anticipated or we are only interested in one direction of the results. For example, a single-tail hypothesis test may be used when evaluating whether or not to adopt a new textbook. We would only decide to adopt the textbook if it improved student achievement relative to the old textbook. A single-tail hypothesis simply states that the mean is greater or less than the hypothesized value.
When performing a single-tail hypothesis test, our alternative hypothesis looks a bit different. When developing the alternative hypothesis in a single-tail hypothesis test we would use the symbols of greater than or less than. Using our example about SAT scores of graduating seniors, our null and alternative hypothesis could look something like:
In this scenario, our null hypothesis states that the mean SAT scores would be equal to 1100 while the alternate hypothesis states that the SAT scores would be greater than 1100. A single-tail hypothesis test also means that we have only one critical region because we put the entire region of rejection into just one side of the distribution. When the alternative hypothesis is that the sample mean is greater, the critical region is on the right side of the distribution. When the alternative hypothesis is that the sample is smaller, the critical region is on the left side of the distribution (see below).
To calculate the critical regions, we must first find the critical values or the cut-offs where the critical regions start. To find these values, we use the critical values found specified by the distribution. These values can be found in a table that lists the areas of each of the tails under a normal distribution. Using this table, we find that for a 0.05 significance level, our critical values would fall at 1.96 standard errors above and below the mean. For a 0.01 significance level, our critical values would fall at 2.57 standard errors above and below the mean. Using the distribution we can find critical values (as specified by standard scores) for any level of significance for either single-or two-tailed hypothesis tests.
Determining Critical Values
Determine the critical value for a single-tailed hypothesis test with a 0.05 significance level.
Using the distribution table, we find that a significance level of 0.05 corresponds with a critical value of 1.645. If alternative hypothesis is the mean is greater than a specified value the critical value would be 1.645. Due to the symmetry of the normal distribution, if the alternative hypothesis is the mean is less than a specified value the critical value would be -1.645.
Calculating the Test Statistic
Before evaluating our hypotheses by determining the critical region and calculating the test statistic, we need confirm that the distribution is normal and determine the hypothesized mean of the distribution.
To evaluate the sample mean against the hypothesized population mean, we use the concept of scores to determine how different the two means are from each other. Based on the Central Limit theorem the distribution of is normal with mean, and standard deviation, . As we learned in previous lessons, the score is calculated by using the formula:
the population mean under the null hypothesis
population standard deviation. If we do not have the population standard deviation and if , we can use the sample standard deviation, . If and we do not have the population sample standard deviation we use a different distribution which will be discussed in a future lesson.
Once we calculate the score, we can make a decision about whether to reject or to fail to reject the null hypothesis based on the critical values.
Following are the steps you must take when doing an hypothesis test:
- Determine the null and alternative hypotheses.
- Verify that necessary conditions are satisfied and summarize the data into a test statistic.
- Determine the level.
- Determine the critical region(s).
- Make a decision (Reject or fail to reject the null hypothesis)
- Interpret the decision in the context of the problem.
Determining Hypotheses and Test Statistics
1) College A has an average SAT score of 1500. From a random sample of 125 freshman psychology students we find the average SAT score to be 1450 with a standard deviation of 100. We want to know if these freshman psychology students are representative of the overall population. What are our hypotheses and the test statistic?
a. Let’s first develop our null and alternative hypotheses:
b. The test statistic is
d. This is a two sided test. If we choose , the critical values will be -1.96 and 1.96. (Use invNorm (.025, 0,1) and the symmetry of the normal distribution to determine these critical values) That is we will reject the null hypothesis if the value of our test statistic is less than -1.96 or greater than 1.96.
e. The value of the test statistic is -5.59. This is less than -1.96 and so our decision is to reject .
f. Based on this sample we believe that the mean is not equal to 1500.
2) A farmer is trying out a planting technique that he hopes will increase the yield on his pea plants. Over the last 5 years the average number of pods on one of his pea plants was 145 pods with a standard deviation of 100 pods. This year, after trying his new planting technique, he takes a random sample of 144 of his plants and finds the average number of pods to be 147. He wonders whether or not this is a statistically significant increase. What are his hypotheses and the test statistic?
a. First, we develop our null and alternative hypotheses:
This alternative hypothesis is > since he believes that there might be a gain in the number of pods.
b. Next, we calculate the test statistic for the sample of pea plants.
c. If we choose
d. The critical value will be 1.645. (Use invNorm (.95, 0, 1) to determine this critical value) We will reject the null hypothesis if the test statistic is greater than 1.645. The value of the test statistic is 0.24.
e. This is less than 1.645 and so our decision is to accept .
f. Based on our sample we believe the mean is equal to 145.
Finding the P-Value of an Event
We can also evaluate a hypothesis by asking “what is the probability of obtaining the value of the test statistic we did if the null hypothesis is true?” This is called the value.
Let’s use the example about the pea farmer. As we mentioned, the farmer is wondering if the number of pea pods per plant has gone up with his new planting technique and finds that out of a sample of 144 peas there is an average number of 147 pods per plant (compared to a previous average of 145 pods, the null hypothesis). To determine the p−value we ask what is P(z > .24)? That is, what is the probability of obtaining a z value greater than .24 if the null hypothesis is true? Using the calculator (normcdf (.24, 99999999, 0, 1) we find this probability to be .405. This indicates that there is a 40.5% chance that under the null hypothesis the peas will produce 147 or more pods.
Type I and Type II Errors
When we decide to reject or not reject the null hypothesis, we have four possible scenarios:
- The null hypothesis is true and we reject it.
- The null hypothesis is true and we do not reject it.
- The null hypothesis is false and we do not reject it.
- The null hypothesis is false and we reject it.
Two of these four possible scenarios lead to correct decisions: accepting the null hypothesis when it is true and rejections the null hypothesis when it is false.
Two of these four possible scenarios lead to errors: rejecting the null hypothesis when it is true and accepting the null hypothesis when it is false.
Which type of error is more serious depends on the specific research situation, but ideally both types of errors should be minimized during the analysis.
|is true||is false|
|Accept||Good Decision||Error (type II)|
|Reject||Error (type I)||Good Decision|
The general approach to hypothesis testing focuses on the Type I error: rejecting the null hypothesis when it may be true. The level of significance, also known as the alpha level, is defined as the probability of making a Type I error when testing a null hypothesis. For example, at the 0.05 level, we know that the decision to reject the hypothesis may be incorrect 5 percent of the time.
Calculating the probability of making a Type II error is not as straightforward as calculating the probability of making a Type I error. The probability of making a Type II error can only be determined when values have been specified for the alternative hypothesis. The probability of making a type II error is denoted by .
Once the value for the alternative hypothesis has been specified, it is possible to determine the probability of making a correct decision . This quantity, , is called the power of the test.
The goal in hypothesis testing is to minimize the potential of both Type I and Type II errors. However, there is a relationship between these two types of errors. As the level of significance or alpha level increases, the probability of making a Type II error decreases and vice versa.
Often we establish the alpha level based on the severity of the consequences of making a Type I error. If the consequences are not that serious, we could set an alpha level at 0.10 or 0.20. However, in a field like medical research we would set the alpha level very low (at 0.001 for example) if there was potential bodily harm to patients. We can also attempt minimize the Type II errors by setting higher alpha levels in situations that do not have grave or costly consequences.
Calculating the Power of a Test
The power of a test is defined as the probability of rejecting the null hypothesis when it is false (that is, making the correct decision). Obviously, we want to maximize this power if we are concerned about making Type II errors. To determine the power of the test, there must be a specified value for the alternative hypothesis.
Suppose that a doctor is concerned about making a Type II error only if the active ingredient in the new medication is greater than 3 milligrams higher than what was specified in the null hypothesis (say, 250 milligrams with a sample of 200 and a standard deviation of 50). Now we have values for both the null and the alternative hypotheses.
By specifying a value for the alternative hypothesis, we have selected one of the many values for . In determining the power of the test, we must assume that is true and determine whether we would correctly reject the null hypothesis
Calculating the exact value for the power of the test requires determining the area above the critical value set up to test the null hypothesis when it is re-centered around the alternative hypothesis. If we have an alpha level of .05 our critical value would be 1.645 for the one tailed test. Therefore,
Solving for we find:
Now, with a new mean set at the alternative hypothesis we want to find the value of the critical score when centered around this score when we center this around the population mean of the alternative hypothesis, . Therefore, we can figure that:
Recall that we reject the null hypothesis if the critical value is to the right of .79. The question now is what is the probability of rejecting the null hypothesis when, in fact, the alternative hypothesis is true? We need to find the area to the right of 0.79. You can find this area using a table or using the calculator with the Normcdf command (Invnorm (0.79, 9999999, 0, 1)). The probability is .2148. This means that since we assumed the alternative hypothesis to be true, there is only a 21.5% chance of rejecting the null hypothesis. Thus, the power of the test is .2148. In other words, this test of the null hypothesis is not very powerful and has only a 0.2148 probability of detecting the real difference between the two hypothesized means.
There are several things that affect the power of a test including:
- Whether the alternative hypothesis is a single-tailed or two-tailed test.
- The level of significance
- The sample size.
Technology Note: Finding critical values on the TI83/84 Calculator
You can also find this critical value using the TI83/84 calculator: [DIST] invNorm(.05,0,1) returns -1.64485. The syntax for this is invNorm (area to the left, mean, standard deviation).
About 10% of the population is left-handed. A researcher believes that journalists are more likely to be left-handed than other people in the general population. The researcher surveys 200 journalists and finds that 25 of them are left-handed.
State the null and alternative hypotheses.
What proportion of the sample is left-handed?
25/200 = .125
To calculate the p-value for the hypothesis test, what probability should the researcher calculate?
To calculate the p-value the researcher must find the probability of finding a sample proportion this large or larger, given the null hypothesis is true.
- In a hypothesis test, if the difference between the sample mean and the hypothesized mean divided by the standard error falls in the middle of the distribution and in between the critical values, we ___ the null hypothesis. If this number falls in the critical regions and beyond the critical values, we ___ the null hypothesis.
- Use the distribution table to determine the critical value for a single-tailed hypothesis test with a 0.01 significance level.
- Sacramento County high school seniors have an average SAT score of 1020. From a random sample of 144 Sacramento High School students we find the average SAT score to be 1100 with a standard deviation of 144. We want to know if these high school students are representative of the overall population. What are our hypotheses and the test statistic?
- During hypothesis testing, we use the value to predict the ___ of an event occurring if the null hypothesis is true.
- A survey shows that California teenagers have an average of $500 in savings (standard error 100). What is the probability that a randomly selected teenager will have savings greater than $520?
- Fill in the types of errors missing from the table below:
|Decision Made||Null Hypothesis is True||Null Hypothesis is False|
|Reject Null Hypothesis||(1) ___||Correct Decision|
|Do not Reject Null Hypothesis||Correct Decision||(2) ___|
- The __ is defined as the probability of rejecting the null hypothesis when it is false (making the correct decision). We want to maximize__if we are concerned about making Type II errors.
- The Governor’s economic committee is investigating average salaries of recent college graduates in California. They decide to test the null hypothesis that the average salary is $24,500 (standard deviation is $4,800)) and is concerned with making a Type II error only if the average salary is less than $25,000. For an and a sample of 144 determine the power of a one-tailed test.
- Consider the following scenario: In a recent survey 72 out of 100 people reported that they prefer to buy bottled water in glass bottles rather than plastic bottles. If there is no difference in preference in the population, the chance of such extreme results in a sample of this size is about .04. Because .04 is less than .05, we conclude that there is a statistically significant difference on preference. Give a numerical value for each of the following:
- The p-value
- The level of significance,
- The sample proportion
- The sample size
- Considering that a result is statistically significant if the p-value is .05 or less, what decision would be made concerning the null and alternative hypotheses in each of the following?
- P-value = .30
- P-value = .001
- P-value = .04
- Two researchers are testing the null hypothesis that the population proportion is .35 and the alternative hypothesis that the population proportion is greater than .35. The first researcher finds a sample proportion of .39 and the second researcher finds a sample proportion of .43. For which researcher will the p-value of the test be smaller? Explain without actually doing any calculations.
- Find the p-value for each of the following situations. Be sure to take into account whether the test is one-sided or two-sided.
- Z-statistic = 2.05,
- Z-statistic = -2.10,
- Z-statistic = -1.08,
- For each of the following calculate the z-statistic.
- For the situations in the previous problem calculate the p-values.
- Suppose a two-sided test for a proportion resulted in a p-value of 0.08.
- Given this information and the usual criterion for hypothesis testing, would you conclude that the population proportion was different from the null hypothesis?
- Suppose the test was a one-sided test instead of a two-sided test and that the sample proportion was in the direction to support the alternative hypothesis. Would you be able to decide in favor of the alternative hypothesis?
- Explain whether each of the following statements is true or false.
- The p-value is the probability that the null hypothesis is true.
- If the null hypothesis is true, then the level of significance is the probability of making a type I error.
- A type II error can only occur when the null hypothesis is true.
- Explain which type of error (I or II) could be made in each of the following situations:
- The null hypothesis is true
- The alternative hypothesis is true
- The null hypothesis is not rejected
- The null hypothesis is rejected.
- Consider medical tests in which the null hypothesis is that the patient does not have the disease and the alternative hypothesis is that the patient does have the disease.
- Give an example of a medical situation in which a type I error would be more serious.
- Give an example of a medical situation in which a type II error would be more serious.
To view the Review, open this PDF file and look for section 8.2.