# 8.3: Testing a Mean Hypothesis

## Learning Objectives

- Calculate the sample test statistic to evaluate a hypothesis about a population mean based on large samples.
- Differentiate the difference in hypothesis testing for situations with small populations and use the Student’s t-distribution accordingly.
- Understand the results of the hypothesis test and how the terms ‘statistically significant’ and ‘not statistically significant’ apply to the results.

## Introduction

In the previous sections, we have covered:

- the reasoning behind hypothesis testing.
- how to conduct single and two-tailed hypothesis tests.
- the potential errors associated with hypothesis testing.
- how to test hypotheses associated with population proportions.

In this section we will take a closer look at some examples that will give us a bit of practice in conducting these tests and what these results really mean. In addition, we will also look at how the terms **statistically significant** and **not statistically significant** apply to these results.

Also, it is important to look at what happens when we have a small sample size. All of the hypotheses that we have examined thus far have assumed that we have normal distributions. But what happens when we have a small sample size and are unsure if our distribution is normal or not? We use something called the Student’s t-distribution to take small sample size into account.

## Evaluating Hypotheses for Population Means using Large Samples

When testing a hypothesis for a normal distribution, we follow a series of four basic steps:

- State the null and alternative hypotheses.
- Set the criterion (critical values) for rejecting the null hypothesis.
- Compute the test statistic.
- Decide about the null hypothesis and interpret our results.

In Step 4, we can make one of two decisions regarding the null hypothesis.

- If the test statistic falls in the regions above or below the critical values (meaning that it is far from the mean), we can reject the null hypothesis.
- If the test statistics falls in the area between the critical values (meaning that it is close to the mean) we fail to reject the null hypothesis.

When we reject the null hypothesis we are saying that the difference between the observed sample mean and the hypothesized population mean is too great to be attributed to chance. If we reject the null hypothesis, we are also saying that the probability that the observed sample mean will have occurred by chance is less than the ? level of or whatever we decide.

When we fail to reject the null hypothesis, we are saying that the difference between the observed sample mean and the hypothesized population mean is probable if the null hypothesis is true. This decision is based on the properties of sampling and the fact that there is not a large difference is reason to not reject the null hypothesis. Essentially, we are willing to attribute this difference to sampling error.

Let’s perform a hypothesis test for the scenarios we examined in the first lesson.

**Example:**

College A has an average SAT score of From a random sample of freshman psychology students we find the average SAT score to be with a standard deviation of . Is the sample of freshman psychology students representative of the overall population?

**Solution:**

Let’s first develop our null and alternative hypotheses:

At a significance level, our critical values would be standard deviations above and below the mean.

Next, we calculate the standard score for the sample of freshman psychology students.

Since the calculated -score of falls in the critical region (as defined by a significance level or anything with a -score of above ) we reject the null hypothesis. Therefore, we can conclude that the probability of obtaining a sample mean equal to if the mean of the population is is very small and the sample of freshman psychology students is not representative of the overall population. Furthermore, the probability of this difference occurring by chance is less than

**Example:**

The school nurse was wondering if the average height of 7th graders has been increasing. Over the last , the average height of a 7th grader was with a standard deviation of The school nurse takes a random sample of students and finds that the average height this year is Conduct a single-tailed hypothesis test using a significance level to evaluate the null and alternative hypotheses.

**Solution:**

First, we develop our null and alternative hypotheses:

At a single-tailed significance level, our critical value for a single-tailed test would be standard deviations above the mean.

Next, we calculate the standard -score for the sample of 7th graders.

Since the calculated -score of does not fall in the critical region (as defined by a significance level or anything with a -score of above ) we fail to reject the null hypothesis. We can conclude that the probability of obtaining a sample mean equal to if the mean of the population is is likely to have been due to chance.

## Hypothesis Testing with Small Populations and Sample Sizes

Back in the early 1900’s a chemist at a brewery in Ireland discovered that when he was working with very small samples, the distributions of the mean differed significantly from the normal distribution. He noticed that as his sample sizes changed, the shape of the distribution changed as well. He published his results under the pseudonym ‘Student’ and this concept and the distributions for small sample sizes are now known as “Student’s -distributions.”

**T-distributions** are a family of distributions that, like the normal distribution, are symmetrical and bell-shaped and centered on a mean. However, the distribution shape changes as the sample size changes. Therefore, there is a specific shape or distribution for every sample of a given size (see figure below; each distribution has a different value of , the number of degrees of freedom, which is less than the size of the sample).

We use the Student's -distribution in hypothesis testing the same way that we use the normal distribution. Each row in the -distribution table (see excerpt below) represents a different -distribution and each distribution is associated with a unique number of degrees of freedom (the number of observations minus one). The column headings in the table represent the portion of the area in the tails of the distribution – we use the numbers in the table just as we used the -scores. Below is an excerpt from the Student's -table for one-sided critical values.

DF | Probability of Exceeding the Critical Value | |||||
---|---|---|---|---|---|---|

1 | ||||||

2 | ||||||

3 | ||||||

4 | ||||||

5 | ||||||

6 | ||||||

7 | ||||||

8 | ||||||

9 | ||||||

10 |

As the number of observations gets larger, the -distribution approaches the shape of the normal distribution. In general, once the sample size is large enough - usually about - we would use the normal distribution or the -table instead.

In calculating the -test statistic, we use the formula:

where:

test statistic

sample mean

hypothesized population mean

estimated standard error

To estimate the standard error , we use the formula where is the standard deviation of the sample and is the sample size.

**Example:**

The high school athletic director is asked if football players are doing as well academically as the other student athletes. We know from a previous study that the average GPA for the student athletes is and that the standard deviation of the sample is . After an initiative to help improve the GPA of student athletes, the athletic director samples football players and finds that their GPA is Is there a significant improvement? Use a significance level.

**Solution:**

First, we establish our null and alternative hypotheses.

Next, we use our alpha level of and the -distribution table to find our critical values. For a two-tailed test with of freedom and a level of significance, our critical values are equal to standard errors above and below the mean.

In calculating the test statistic, we use the formula:

This means that the observed sample mean of football players is standard errors above the hypothesized value of Because does not exceed (the standard critical value), the null hypothesis is not rejected.

Therefore, we can conclude that the difference between the sample mean and the hypothesized value is not sufficient to attribute it to anything other than sampling error. Thus, the athletic director can conclude that the mean academic performance of football players does not differ from the mean performance of other student athletes.

## How to Interpret the Results of a Hypothesis Test

In the previous section, we discussed how to interpret the results of a hypothesis test. As a reminder, when we reject the null hypothesis we are saying that the difference between the observed sample mean and the hypothesized population mean is too great to be attributed to chance. When we fail to reject the null hypothesis, we are saying that the difference between the observed sample mean and the hypothesized population mean is probable if the null hypothesis is true. Essentially, we are willing to attribute this difference to sampling error.

But what is meant by **statistical significance**? Technically, the difference between the hypothesized population mean and the sample mean is said to be *statistically significant* when the probability that the difference occurred by chance is less than the significance level. Therefore, when the calculated test statistic (whether it is the - or the -score) falls in the area beyond the critical score, we say that the difference between the sample mean and the hypothesized population mean is **statistically significant.** When the calculated test statistic falls in the area between the critical scores we say that the difference between the sample mean and the hypothesized population mean is **not statistically significant.**

## Lesson Summary

1. When testing a hypothesis for the mean of a distribution, we follow a series of four basic steps:

- State the null and alternative hypotheses.
- Set the criterion (critical values) for rejecting the null hypothesis.
- Compute the test statistic.
- Decide about the null hypothesis and interpret our results.

2. When we reject the null hypothesis we are saying that the difference between the observed sample mean and the hypothesized population mean is too great to be attributed to chance.

3. When we fail to reject the null hypothesis, we are saying that the difference between the observed sample mean and the hypothesized population mean is probable if the null hypothesis is true.

4. We use the -distribution in hypothesis testing the same way that we use the normal distribution. However, the -distribution is used when the sample size is small (typically less than ) and the population standard deviation is unknown.

5. When calculating the -statistic, we use the formula:

where:

test statistic

sample mean

hypothesized population mean

estimated standard error, which is computed by

6. The difference between the hypothesized population mean and the sample mean is said to be statistically significant when the probability that the difference occurred by chance is less than the significance level.

## Review Questions

- In hypothesis testing, when we work with large samples (typically samples over ), we use the ___ distribution. When working with small samples (typically samples under ), we use the ___ distribution.
- True or False: When we fail to reject the null hypothesis, we are saying that the difference between the observed sample mean and the hypothesized population mean is probable if the null hypothesis is true.

The dean from UCLA is concerned that the student’s grade point averages have changed dramatically in recent years. The graduating seniors’ mean GPA over the last five years is . The dean randomly samples seniors from the last graduating class and finds that their mean GPA is , with a sample standard deviation of .

- What would the null and alternative hypotheses be for this scenario?
- What would the standard error be for this particular scenario?
- Describe in your own words how you would set the critical regions and what they would be at an alpha level of .
- Test the null hypothesis and explain your decision
- Suppose that the dean samples only students. Would a -distribution now be the appropriate sampling distribution for the mean? Why or why not?
- Using the appropriate -distribution, test the same null hypothesis with a sample of
- With a sample size of , do you need to have a
**larger**or**smaller**difference between then hypothesized population mean and the sample mean to obtain statistical significance? Explain your answer. - For each of the following scenarios, state which one is more likely to lead to the rejection of the null hypothesis.
- A one-tailed or two-tailed test
- or level of significance
- A sample size of or

## Review Answers

- True
- When setting the critical regions for this hypothesis, it is important to consider the repercussions of the decision. Since there does not appear to be major financial or health repercussions of this decision, a more conservative alpha level need not be chosen. With an alpha level of and a sample size of , we find the area under the curve associated in the -distribution and set the critical regions accordingly. With this alpha level and sample size, the critical regions are set at standard scores above and below the mean.
- With a calculated test statistic of , we reject the null hypothesis since it falls beyond the critical values established with an alpha level of . This means that the probability that the observed sample mean would have occurred by chance if the null hypothesis is true is less than .
- Yes, because the sample size is below , in most cases the -distribution would be the appropriate distribution to use and what you have is not .
- The critical values for this scenario using the -distribution are standard scores above and below the mean. With a calculated -test statistic of , we do not reject the null hypothesis. Therefore, we can conclude that the probability that the observed sample mean could have occurred by chance if the null hypothesis was true is greater than .
- You would need a larger difference because the standard error of the mean would be greater with a sample size of than with a sample size of .
- (a) one-tailed test (b) level of significance (c)