# 11.1: The F-Distribution and Testing Two Variances

**At Grade**Created by: CK-12

## Learning Objectives

- Understand the differences between the \begin{align*}F\end{align*}-distribution and Student’s \begin{align*}t\end{align*}-distribution.
- Calculate a test statistic as a ratio of values derived from sample variances.
- Use random samples to test hypotheses about multiple independent population variances.
- Understand the limits of inferences derived from these methods.

## Introduction

In previous lessons, we learned how to conduct hypothesis tests that examined the relationship between two variables. Most of these tests simply evaluated the relationship of the means of two variables. However, sometimes we also want to test the variance, or the degree to which observations are spread out within a distribution. In the figure below, we see three samples with identical means (the samples in red, green, and blue) but with very difference variances:

So why would we want to conduct a hypothesis test on variance? Let’s consider an example. Suppose a teacher wants to examine the effectiveness of two reading programs. She randomly assigns her students into two groups, uses a different reading program with each group, and gives her students an achievement test. In deciding which reading program is more effective, it would be helpful to not only look at the mean scores of each of the groups, but also the “spreading out” of the achievement scores. To test hypotheses about variance, we use a statistical tool called the \begin{align*}F\end{align*}-distribution.

In this lesson, we will examine the difference between the \begin{align*}F\end{align*}-distribution and Student’s \begin{align*}t\end{align*}-distribution, calculate a test statistic with the \begin{align*}F\end{align*}-distribution, and test hypotheses about multiple population variances. In addition, we will look a bit more closely at the limitations of this test.

### The \begin{align*}F\end{align*}-Distribution

The *\begin{align*}F\end{align*}-distribution* is actually a family of distributions. The specific \begin{align*}F\end{align*}-distribution for testing two population variances, \begin{align*}\sigma^2_1\end{align*} and \begin{align*}\sigma^2_2\end{align*}, is based on two values for degrees of freedom (one for each of the populations). Unlike the normal distribution and the \begin{align*}t\end{align*}-distribution, \begin{align*}F\end{align*}-distributions are not symmetrical and span only non-negative numbers. (Normal distributions and \begin{align*}t\end{align*}-distributions are symmetric and have both positive and negative values.) In addition, the shapes of \begin{align*}F\end{align*}-distributions vary drastically, especially when the value for degrees of freedom is small. These characteristics make determining the critical values for \begin{align*}F\end{align*}-distributions more complicated than for normal distributions and Student’s \begin{align*}t\end{align*}-distributions. \begin{align*}F\end{align*}-distributions for various degrees of freedom are shown below:

### \begin{align*}F\end{align*}-Max Test: Calculating the Sample Test Statistic

We use the *\begin{align*}F\end{align*}-ratio test statistic* when testing the hypothesis that there is no difference between population variances. When calculating this ratio, we really just need the variance from each of the samples. It is recommended that the larger sample variance be placed in the numerator of the \begin{align*}F\end{align*}-ratio and the smaller sample variance in the denominator. By doing this, the ratio will always be greater than 1.00 and will simplify the hypothesis test.

*Example:* Suppose a teacher administered two different reading programs to two groups of students and collected the following achievement score data:

\begin{align*}& \text{Program 1} && \text{Program 2}\\ & n_1=31 && n_2=41\\ & \bar{x}_1=43.6 && \bar{x}_2=43.8\\ & s{_1}^2=105.96 && s{_2}^2=36.42\end{align*}

What is the \begin{align*}F\end{align*}-ratio for these data?

\begin{align*}F=\frac{s{_1}^2}{s{_2}^2}=\frac{105.96}{36.42} \approx 2.909\end{align*}

### \begin{align*}F\end{align*}-Max Test: Testing Hypotheses about Multiple Independent Population Variances

When we test the hypothesis that two variances of populations from which random samples were selected are equal, \begin{align*}H_0: \sigma^2_1=\sigma^2_2\end{align*} (or in other words, that the ratio of the variances \begin{align*}\frac{\sigma^2_1}{\sigma^2_2}=1\end{align*}), we call this test the *\begin{align*}F\end{align*}-Max test*. Since we have a null hypothesis of \begin{align*}H_0: \sigma^2_1=\sigma^2_2\end{align*}, our alternative hypothesis would be \begin{align*}H_a: \sigma^2_1 \neq \sigma^2_2\end{align*}.

Establishing the critical values in an \begin{align*}F\end{align*}-test is a bit more complicated than when doing so in other hypothesis tests. Most tables contain multiple \begin{align*}F\end{align*}-distributions, one for each of the following: 1 percent, 5 percent, 10 percent, and 25 percent of the area in the right-hand tail. (Please see the supplemental link for an example of this type of table.) We also need to use the degrees of freedom from each of the samples to determine the critical values.

*On the Web*

http://www.statsoft.com/textbook/sttable.html#f01 \begin{align*}F\end{align*}-distribution tables.

*Example:* Suppose we are trying to determine the critical values for the scenario in the preceding section, and we set the level of significance to 0.02. Because we have a two-tailed test, we assign 0.01 to the area to the right of the positive critical value. Using the \begin{align*}F\end{align*}-table for \begin{align*}\alpha=0.01\end{align*}, we find the critical value at 2.203, since the numerator has 30 degrees of freedom and the denominator has 40 degrees of freedom.

Once we find our critical values and calculate our test statistic, we perform the hypothesis test the same way we do with the hypothesis tests using the normal distribution and Student’s \begin{align*}t\end{align*}-distribution.

*Example:* Using our example from the preceding section, suppose a teacher administered two different reading programs to two different groups of students and was interested if one program produced a greater variance in scores. Perform a hypothesis test to answer her question.

For the example, we calculated an \begin{align*}F\end{align*}-ratio of 2.909 and found a critical value of 2.203. Since the observed test statistic exceeds the critical value, we reject the null hypothesis. Therefore, we can conclude that the observed ratio of the variances from the independent samples would have occurred by chance if the population variances were equal less than 2% of the time. We can conclude that the variance of the student achievement scores for the second sample is less than the variance of the scores for the first sample. We can also see that the achievement test means are practically equal, so the difference in the variances of the student achievement scores may help the teacher in her selection of a program.

### The Limits of Using the \begin{align*}F\end{align*}-Distribution to Test Variance

The test of the null hypothesis, \begin{align*}H_0; \sigma^2_1=\sigma^2_2\end{align*}, using the \begin{align*}F\end{align*}-distribution is only appropriate when it can safely be assumed that the population is normally distributed. If we are testing the equality of standard deviations between two samples, it is important to remember that the \begin{align*}F\end{align*}-test is extremely sensitive. Therefore, if the data displays even small departures from the normal distribution, including non-linearity or outliers, the test is unreliable and should not be used. In the next lesson, we will introduce several tests that we can use when the data are not normally distributed.

## Lesson Summary

We use the \begin{align*}F\end{align*}-Max test and the \begin{align*}F\end{align*}-distribution when testing if two variances from independent samples are equal.

The \begin{align*}F\end{align*}-distribution differs from the normal distribution and Student’s \begin{align*}t\end{align*}-distribution. Unlike the normal distribution and the \begin{align*}t\end{align*}-distribution, \begin{align*}F\end{align*}-distributions are not symmetrical and go from 0 to \begin{align*}\infty\end{align*}, not from \begin{align*}- \infty\end{align*} to \begin{align*}\infty\end{align*} as the others do.

When testing the variances from independent samples, we calculate the \begin{align*}F\end{align*}-ratio test statistic, which is the ratio of the variances of the independent samples.

When we reject the null hypothesis, \begin{align*}H_0:\sigma^2_1=\sigma^2_2\end{align*}, we conclude that the variances of the two populations are not equal.

The test of the null hypothesis, \begin{align*}H_0: \sigma^2_1=\sigma^2_2\end{align*}, using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed.

## Review Questions

- We use the \begin{align*}F\end{align*}-Max test to examine the differences in the ___ between two independent samples.
- List two differences between the \begin{align*}F\end{align*}-distribution and Student’s \begin{align*}t\end{align*}-distribution.
- When we test the differences between the variances of two independent samples, we calculate the ___.
- When calculating the \begin{align*}F\end{align*}-ratio, it is recommended that the sample with the ___ sample variance be placed in the numerator, and the sample with the ___ sample variance be placed in the denominator.
- Suppose a guidance counselor tested the mean of two student achievement samples from different SAT preparatory courses. She found that the two independent samples had similar means, but also wants to test the variance associated with the samples. She collected the following data:

\begin{align*}& \text{SAT Prep Course} \ \# 1 && \text{SAT Prep Course} \ \# 2\\ & n=31 && n=21\\ & s^2=42.30 && s^2=18.80\end{align*}

(a) What are the null and alternative hypotheses for this scenario?

(b) What is the critical value with \begin{align*}\alpha=0.10\end{align*}?

(c) Calculate the \begin{align*}F\end{align*}-ratio.

(d) Would you reject or fail to reject the null hypothesis? Explain your reasoning.

(e) Interpret the results and determine what the guidance counselor can conclude from this hypothesis test.

- True or False: The test of the null hypothesis, \begin{align*}H_0:\sigma^2_1=\sigma^2_2\end{align*}, using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed.