# 11.1: The F-Distribution and Testing Two Variances

**At Grade**Created by: CK-12

## Learning Objectives

- Understand the differences between the \begin{align*}F\end{align*}- and the Student’s \begin{align*}t\end{align*}-distributions.
- Calculate a test statistic as a ratio of values derived from sample variances.
- Use random samples to test hypotheses about multiple independent population variances.
- Understand the limits of inferences derived from these methods.

## Introduction

In previous lessons we learned how to conduct hypothesis tests examining the relationship between two variables. Most of these tests simply evaluated the relationship of the **means** of two variables. However, sometimes we also want to test the **variance** or the degree to which observations are spread out within a distribution. In the figure below, we see three samples with identical means (the samples in red, green and blue) but with very difference variances.

So why would we want to conduct a hypothesis test on variance? Let’s consider an example. Say that a teacher wants to examine the effectiveness of two reading programs. She randomly assigns her students into two groups, uses the different reading programs with each group and gives her students an achievement test. In deciding which reading program is more effective, it would be helpful to not only look at the mean scores of each of the groups, but also the “spreading out” of the achievement scores. To test hypotheses about variance, we use a statistical tool called the \begin{align*}F\end{align*}- **distribution.**

In this lesson we will examine the difference between the \begin{align*}F\end{align*}- and Student’s \begin{align*}t\end{align*}-distributions, calculate the test statistic and test hypotheses about multiple population variances. In addition, we will look a bit more closely at the limitations of this test.

## Differences between the F- and Student’s t-Distributions

As review, we use the Student’s \begin{align*}t\end{align*}-distribution when we are conducting hypotheses tests where the variance of the population is unknown. Usually, the variance of the population is *not* known and it is necessary to estimate it by using the variance of the sample. Using the variance of a sample to estimate population variance can be inappropriate – especially if we have a small sample size. For estimating the population variance from a small sample we use a statistical tool called the **Student’s** \begin{align*}t\end{align*}- **distribution.**

The Student’s \begin{align*}t\end{align*}-distribution is a family of distributions that, like the normal distribution, are symmetrical, bell-shaped and centered on the mean. The shape of these distributions changes as the sample sizes changes (see below) and each \begin{align*}t\end{align*}-distribution is associated with a unique number of Degrees of Freedom (number of observations in the sample minus one). As the number of observations (shown by \begin{align*}k\end{align*} in the figure) increases, the difference between the \begin{align*}t\end{align*}-distribution and the normal distribution (in pink) decreases.

The \begin{align*}F\end{align*}-distribution is quite a bit different. When we test the hypothesis that two variances in the populations from which random samples were selected are equal \begin{align*}(H_0: \sigma_1{^2} = \sigma_2{^2})\end{align*} (or in other words that the ratio of the variances \begin{align*}(\sigma_1{^2})/(\sigma_2{^2})\end{align*} equals \begin{align*}1.00\end{align*}), we call this test the \begin{align*}F\end{align*}- **Max test.**

Since we are testing ratios, the \begin{align*}F\end{align*}-distribution looks quite different from the Student’s \begin{align*}t\end{align*}-distribution (see below). Like the Student’s \begin{align*}t\end{align*}-distribution, the \begin{align*}F\end{align*}-distribution is a family of distributions. The specific \begin{align*}F\end{align*}-distribution for testing two population variances \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} is based on two Degrees of Freedom (one for each of the populations). Unlike the normal and the \begin{align*}t\end{align*}-distributions, the \begin{align*}F\end{align*}-distributions are not symmetrical and span only non-negative numbers (unlike others that are symmetric and have both positive and negative values.) In addition, the shapes of the \begin{align*}F\end{align*}-distribution vary drastically, especially when the degrees of freedom values are small. These characteristics make determining the critical values for the \begin{align*}F\end{align*}-distribution more complicated than for the normal and Student’s \begin{align*}t\end{align*}-distributions.

\begin{align*}F\end{align*}- **Max Test: Calculating the Sample Test Statistic**

We use the \begin{align*}F\end{align*}- **ratio** test statistic when testing the hypothesis that there is no difference between population variances. When calculating this ratio, we really just need the variance from each of the samples. It is recommended that the larger sample variance be placed in the numerator of the \begin{align*}F\end{align*}-ratio and the smaller sample variance in the denominator. By doing this, the ratio will always be greater than \begin{align*}1.00\end{align*} and will simplify the hypothesis test.

**Example:**

Suppose a teacher administered two different reading programs to two groups of students and collected the following achievement score data:

\begin{align*}& \text{Program}\ 1 & & \text{Program}\ 2\\ & n_1 = 31 & & n_2 = 41\\ & \bar{X}_1 = 43.6 & & \bar{X}_2 = 43.8\\ & s_1{^2} = 105.96 & & s_2{^2} = 36.42\end{align*}

What is the \begin{align*}F\end{align*}-ratio for these data?

**Solution:**

\begin{align*}F = \frac{s_1{^2}} {s_2{^2}} = \frac{105.96} {36.42} \approx 2.909\end{align*}

## F-Max Test: Testing Hypotheses about Multiple Independent Population Variances

As mentioned, in certain situations we are interested in determining if there is a difference in the population variances between two independent samples. We can conduct a hypothesis test of no difference between the population variances with the null hypothesis of \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*}. Therefore, our alternative hypothesis would be \begin{align*}H_a: \sigma_1 {^2} \ne \sigma_2{^2}\end{align*}.

Establishing the critical values in an \begin{align*}F\end{align*}-test is a bit more complicated than when doing so in other hypothesis tests. Most tables contain multiple \begin{align*}F\end{align*}-distributions, one for each of the following: \begin{align*}1 \;\mathrm{percent}\end{align*}, \begin{align*}5 \;\mathrm{percent}\end{align*}, \begin{align*}10 \;\mathrm{percent}\end{align*} and \begin{align*}25 \;\mathrm{percent}\end{align*} of the area are in the right-hand tail (please see the supplemental links for an example of the table). We also need to use the degrees of freedom from **each** of the samples to determine the critical values.

Say, for example, that we are trying to determine the critical values for the scenario above and we set the level of significance at \begin{align*}.02 (\alpha =.02)\end{align*}. Because we have a two-tailed test, we assign .01 to the area of the right of the critical value. Using the \begin{align*}F\end{align*}-table for \begin{align*}\alpha =.01\end{align*} (for example, see http://www.statsoft.com/textbook/sttable.html#f01) , we find the critical value at \begin{align*}2.20\end{align*} (\begin{align*}df = 30\end{align*} and \begin{align*}40\end{align*} for the numerator and denominator with a \begin{align*}\alpha =.01\end{align*} to the area to the right of the tail).

Once we set our critical values and calculate our test statistic, we perform the hypothesis test the same way we do with the hypothesis tests using the normal and the Student’s \begin{align*}t\end{align*}-distributions.

**Example:**

Using our example above, suppose a teacher administered two different reading programs to two different groups of students and was interested if one program produced a greater variance in scores. Perform a hypothesis test to answer her question.

**Solution:**

In the example above, we calculated an \begin{align*}F\end{align*} ratio of \begin{align*}2.909\end{align*} and found a critical value of \begin{align*}2.20\end{align*}.

Since the observed test statistic exceeds the critical value, we reject the null hypothesis. Therefore, we can conclude that the observed ratio of the variances from the independent samples would have occurred by chance if the population variances were equal less than \begin{align*}2\% (.02)\end{align*} of the time. We can conclude that the variance of the student achievement scores for the second sample is less than the variance for the students in the first sample. We can also see that the achievement test means are practically equal so the variance in student achievement scores may help the teacher in her selection of a program.

## The Limits of Using the F-Distribution to Test Variance

The test of the null hypothesis \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed. If we are testing the equality of standard deviations between two samples, it is important to remember that the \begin{align*}F\end{align*}-test is extremely sensitive. Therefore, if the data displays even small departures from the normal distribution including non-linearity or outliers, the test is unreliable and should not be used. In the next lesson, we will introduce several tests that we can use when the data are not normally distributed.

## Lesson Summary

- We use the \begin{align*}F\end{align*}-Max test and the \begin{align*}F\end{align*}-distribution when testing if two variances from independent samples are equal.
- The \begin{align*}F\end{align*}-distribution differs from the Student’s \begin{align*}t\end{align*}-distribution. Unlike the normal and the \begin{align*}t\end{align*}-distributions, the \begin{align*}F\end{align*}-distributions are not symmetrical and go from zero to infinity \begin{align*}(\infty)\end{align*} not from \begin{align*}-\infty\end{align*} to \begin{align*}\infty\end{align*} as the others do.
- When testing the variances from independent samples, we calculate the \begin{align*}F\end{align*}-ratio, which is the ratio of the variances of the independent samples.
- When we reject the null hypothesis \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} we conclude that the variances of the two populations are not equal.
- The test of the null hypothesis \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed.

**Supplemental Links**

- Distribution Tables http://www.statsoft.com/textbook/sttable.html

## Review Questions

- We use the \begin{align*}F\end{align*}-Max test to examine the differences in the ___ between two independent samples.
- List two differences between the \begin{align*}F\end{align*}- and the Student’s \begin{align*}t\end{align*}-distributions.
- When we test the differences between the variance of two independent samples, we calculate the ___.
- When calculating the \begin{align*}F\end{align*}-ratio, it is recommended that the sample with the ___ sample variance be placed in the numerator and the sample with the ___ sample variance be placed in the denominator.

Suppose the guidance counselor tested the mean of two student achievement samples from different SAT preparatory courses. She found that the two independent samples had similar means, but also wants to test the variance associated with the samples. She collected the following data:

\begin{align*}& \text{SAT Prep Course}\ \#1 & & \text{SAT Prep Course}\ \#2\\ & n = 31 & & n = 21\\ & s^2 = 42.30 & & s^2 = 18.80\end{align*}

- What are the null and alternative hypotheses for this scenario?
- What is the critical value with a \begin{align*}\alpha =.10\end{align*}?
- Calculate the \begin{align*}F\end{align*}-ratio.
- Would you reject or fail to reject the null hypothesis? Explain your reasoning.
- Interpret the results and what the guidance counselor can conclude from this hypothesis test.
- True or False: The test of the null hypothesis \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed.

## Review Answers

- Variance
- Answers may vary but could include:
- We use the \begin{align*}t\end{align*}-distribution when testing the difference between the means of two independent samples and the \begin{align*}F\end{align*}-distribution when testing the difference between the variances of two independent samples.
- The \begin{align*}t\end{align*}-distribution is based off of one degree of freedom and the \begin{align*}F\end{align*}-distribution is based off of two.
- \begin{align*}F\end{align*}-distributions are not symmetrical, \begin{align*}t\end{align*}-distributions are.
- \begin{align*}T\end{align*}-values range from \begin{align*}-\infty\end{align*} to \begin{align*}\infty\end{align*} while \begin{align*}F\end{align*}-ratios range from zero to \begin{align*}\infty\end{align*}

- \begin{align*}F\end{align*}-ratio
- larger, smaller
- \begin{align*}H_0: \sigma_1{^2} = \sigma_2{^2}\end{align*} or \begin{align*}\sigma_1{^2}/\sigma_2{^2} = 1, H_a: \sigma_1{^2} \ne \sigma_2{^2}\end{align*} or \begin{align*}\sigma_1{^2}/\sigma_2{^2} \ne 1\end{align*}
- \begin{align*}2.04\end{align*}
- \begin{align*}2.25\end{align*}
- We would reject the null hypothesis because the calculated \begin{align*}F\end{align*} ratio \begin{align*}(2.25)\end{align*} exceeds the critical value \begin{align*}(2.04)\end{align*}.
- We can conclude that the variance of the student achievement scores for the second sample is less than the variance for the students in the first sample. Since the achievement test means are practically equal, the variance in student achievement scores may help the guidance counselor in her selection of a preparatory program.
- True