The F-Distribution
The \begin{align*}F\end{align*}-distribution is actually a family of distributions. The specific \begin{align*}F\end{align*}-distribution for testing two population variances, \begin{align*}\sigma^2_1\end{align*} and \begin{align*}\sigma^2_2\end{align*}, is based on two values for degrees of freedom (one for each of the populations). Unlike the normal distribution and the \begin{align*}t\end{align*}-distribution, \begin{align*}F\end{align*}-distributions are not symmetrical and span only non-negative numbers. (Normal distributions and \begin{align*}t\end{align*}-distributions are symmetric and have both positive and negative values.) In addition, the shapes of \begin{align*}F\end{align*}-distributions vary drastically, especially when the value for degrees of freedom is small. These characteristics make determining the critical values for \begin{align*}F\end{align*}-distributions more complicated than for normal distributions and Student’s \begin{align*}t\end{align*}-distributions. \begin{align*}F\end{align*}-distributions for various degrees of freedom are shown below:
\begin{align*}F\end{align*}-Max Test: Calculating the Sample Test Statistic
We use the \begin{align*}F\end{align*}-ratio test statistic when testing the hypothesis that there is no difference between population variances. When calculating this ratio, we really just need the variance from each of the samples. It is recommended that the larger sample variance be placed in the numerator of the \begin{align*}F\end{align*}-ratio and the smaller sample variance in the denominator. By doing this, the ratio will always be greater than 1.00 and will simplify the hypothesis test.
Calculating the F-Ratio
Suppose a teacher administered two different reading programs to two groups of students and collected the following achievement score data:
\begin{align*}& \text{Program 1} && \text{Program 2}\\ & n_1=31 && n_2=41\\ & \bar{x}_1=43.6 && \bar{x}_2=43.8\\ & s{_1}^2=105.96 && s{_2}^2=36.42\end{align*}
What is the \begin{align*}F\end{align*}-ratio for these data?
\begin{align*}F=\frac{s{_1}^2}{s{_2}^2}=\frac{105.96}{36.42} \approx 2.909\end{align*}
\begin{align*}F\end{align*}-Max Test: Testing Hypotheses about Multiple Independent Population Variances
When we test the hypothesis that two variances of populations from which random samples were selected are equal, \begin{align*}H_0: \sigma^2_1=\sigma^2_2\end{align*} (or in other words, that the ratio of the variances \begin{align*}\frac{\sigma^2_1}{\sigma^2_2}=1\end{align*}), we call this test the \begin{align*}F\end{align*}-Max test. Since we have a null hypothesis of \begin{align*}H_0: \sigma^2_1=\sigma^2_2\end{align*}, our alternative hypothesis would be \begin{align*}H_a: \sigma^2_1 \neq \sigma^2_2\end{align*}.
Establishing the critical values in an \begin{align*}F\end{align*}-test is a bit more complicated than when doing so in other hypothesis tests. Most tables contain multiple \begin{align*}F\end{align*}-distributions, one for each of the following: 1 percent, 5 percent, 10 percent, and 25 percent of the area in the right-hand tail. (Please see the supplemental link for an example of this type of table.) We also need to use the degrees of freedom from each of the samples to determine the critical values.
Determining the Critical Value
Suppose we are trying to determine the critical values for the scenario in the preceding section, and we set the level of significance to 0.02. Because we have a two-tailed test, we assign 0.01 to the area to the right of the positive critical value. Using the \begin{align*}F\end{align*}-table for \begin{align*}\alpha=0.01\end{align*}, we find the critical value at 2.203, since the numerator has 30 degrees of freedom and the denominator has 40 degrees of freedom.
Once we find our critical values and calculate our test statistic, we perform the hypothesis test the same way we do with the hypothesis tests using the normal distribution and Student’s \begin{align*}t\end{align*}-distribution.
Performing a Hypothesis Test
Using our example from the preceding section, suppose a teacher administered two different reading programs to two different groups of students and was interested if one program produced a greater variance in scores. Perform a hypothesis test to answer her question.
For the example, we calculated an \begin{align*}F\end{align*}-ratio of 2.909 and found a critical value of 2.203. Since the observed test statistic exceeds the critical value, we reject the null hypothesis. Therefore, we can conclude that the observed ratio of the variances from the independent samples would have occurred by chance if the population variances were equal less than 2% of the time. We can conclude that the variance of the student achievement scores for the second sample is less than the variance of the scores for the first sample. We can also see that the achievement test means are practically equal, so the difference in the variances of the student achievement scores may help the teacher in her selection of a program.
The Limits of Using the \begin{align*}F\end{align*}-Distribution to Test Variance
The test of the null hypothesis, \begin{align*}H_0; \sigma^2_1=\sigma^2_2\end{align*}, using the \begin{align*}F\end{align*}-distribution is only appropriate when it can safely be assumed that the population is normally distributed. If we are testing the equality of standard deviations between two samples, it is important to remember that the \begin{align*}F\end{align*}-test is extremely sensitive. Therefore, if the data displays even small departures from the normal distribution, including non-linearity or outliers, the test is unreliable and should not be used. In the next lesson, we will introduce several tests that we can use when the data are not normally distributed.
Examples
Measurements are taken before and after a specific date. You are interested in whether the population variances are the same before and after. Following is the information you have:
N | Mean | Standard Deviation | Variance | |
---|---|---|---|---|
Before 20 | 2.987 | 6.987 | 48.818 | |
After 20 | 2,435 | 4.987 | 24.870 |
Example 1
What is the null hypothesis?
Since we are interested in whether or not the variances are the same, the null hypothesis is:
\begin{align*}H_0: \sigma^2_B=\sigma^2_A\end{align*}
Example 2
What is the value of the test statistic?
\begin{align*}F =\frac{6.987^2}{4.987^2}=\frac{48.818}{24.870} = 1.963 \end{align*} with 19 degrees of freedom for both the numerator and denominator.
Example 3
At the 0.01 level of significance, what is the F critical value?
The F critical value for a one-sided test is 3.101 at the 0.01 level of significance. Since this is a two-tailed test we would need the critical value for a one-sided test at 0.005. This critical value would be larger than 3.101.
Example 4
Do you reject or fail to reject the null hypothesis? Explain.
The decision is to fail to reject the null hypothesis since our test statistic is 1.963 which is smaller than the critical value. This means that we believe that the variances are the same before and after.
Review
- We use the \begin{align*}F\end{align*}-Max test to examine the differences in the ___ between two independent samples.
- List two differences between the \begin{align*}F\end{align*}-distribution and Student’s \begin{align*}t\end{align*}-distribution.
- When we test the differences between the variances of two independent samples, we calculate the ___.
- When calculating the \begin{align*}F\end{align*}-ratio, it is recommended that the sample with the ___ sample variance be placed in the numerator, and the sample with the ___ sample variance be placed in the denominator.
- Suppose a guidance counselor tested the mean of two student achievement samples from different SAT preparatory courses. She found that the two independent samples had similar means, but also wants to test the variance associated with the samples. She collected the following data:
\begin{align*}& \text{SAT Prep Course} \ \# 1 && \text{SAT Prep Course} \ \# 2\\ & n=31 && n=21\\ & s^2=42.30 && s^2=18.80\end{align*}
a. What are the null and alternative hypotheses for this scenario?
b. What is the critical value with \begin{align*}\alpha=0.10\end{align*}?
c. Calculate the \begin{align*}F\end{align*}-ratio.
d. Would you reject or fail to reject the null hypothesis? Explain your reasoning.
e. Interpret the results and determine what the guidance counselor can conclude from this hypothesis test.
- True or False: The test of the null hypothesis, \begin{align*}H_0:\sigma^2_1=\sigma^2_2\end{align*}, using the \begin{align*}F\end{align*}-distribution is only appropriate when it can be safely assumed that the population is normally distributed.
- Consider the following table:
Variable 1 | Variable 2 | |
---|---|---|
Mean | 25.642857 | 43.8125 |
Variance | 15.22619048 | 96.42410714 |
Observations | 7 | 8 |
Df | 6 | 7 |
F | 0.157908545 | |
P(F f) one tail | 0.01927 | |
F Critical one-tail | 0.23771837 |
a. What is the null hypothesis?
b. What is the calculated F statistic?
c. What is the decision?
- List the properties of the F distribution.
- Name two differences between the F test and the chi-square test.
- Which of the following statements is correct?
- Just like the chi-square, the F-distribution is skewed.
- The F-distribution is defined by three parameters.
- The two samples used in the F test must be the same size.
- Choose the correct response to complete this sentence: A small value for F will result in
- A non-rejection of the null hypothesis
- A rejection of the null hypothesis
- The test statistic not falling in the acceptance region.
- The F-test is
- Used to carry out a hypothesis test for an analysis of variance.
- A test used to predict the variance when the sample size is unknown.
- Used to carry out a hypothesis test for an analysis of the standard deviation.
- If we are testing is the variances of two populations are equal, what should be the distribution of the underlying populations?
- F-distribution
- Chi-square distribution
- Normal distribution
- A large value of F will result in:
- Rejection of the null hypothesis
- Acceptance of the null hypothesis
- The test statistic falling in the acceptance region.
- A researcher wants to do a hypothesis test regarding the equality of the variances of two normally distributed populations. Which of the following statements are true in this situation?
- The F-statistic is the appropriate test statistic that should be used in this case.
- The chi-square is the appropriate test statistic that should be used in this case.
- If the calculated statistic is within the critical values, then the conclusion would be not to reject the statement that the variances of the two populations are equal.
- If the calculated statistic is within the critical values, then the conclusion would be to reject the statement that the variances of the two populations are equal.
I | a and b |
---|---|
II | a and c |
III | b and c |
- Which of the following test statistics should be used when doing a hypothesis test for the equality of the variances of two populations?
- T-distribution
- Chi-square
- F-distribution
- The standard deviation of the scores of 20 students in their biology test is 10 and that of their chemistry test is 5. Find the value of the F-test statistic and decide whether the variances for both tests differ at the 0.10 level of significance.
- The sample variances of two populations are 45 (from a sample of size 25) and 78 (from sample of size 16). If the degrees of freedom of the numerator is 15 and that of the denominator is 24 for testing the variances of two populations using the F-test, find the value of the test statistic. At the .05 level of significance, can you accept that the variances are the same?
- You are interested in testing the difference between two population variances. Below is data from the two samples from each of the populations. Find the F-test value.
Sample 1
2 3 4 1 8 2 4 5 1 9 4 2 7 9
Sample 2
2 1 1 4 5 1 1 1 5 7
- The variance of a sample of n = 20 is 62 and the variance of a second sample of n = 15 is 23. At the .10 level of significance, find the F-test value and check whether variances differ significantly.
- Which of the following are true?
I. The mean value of F is approximately zero.
II. The F distribution is a family of curves based on the degrees of freedom of the variance of the numerator.
III. The F distribution is a family of curves based on the degrees of freedom of the variance of the denominator.
a. I and II only
b. II and III only
c. I only
d. Neither I nor II
- The basic assumption(s) for estimating the difference between two variances is/are:
I. The samples must be dependent on each other.
II. The samples must be independent of each other.
III. The populations from which the samples were drawn must be normally distributed.
IV. The populations from which the samples were drawn must depart from normality.
a. I and III only
b. II and III only
c. II and IV only
d. I and IV only
- Find the value of the F test statistic and the critical value, at the .05 level of significance, for testing the equality of population variances. The sample data is below.
Sample Size | Sample variance |
---|---|
30 | 12 |
25 | 8 |
Review (Answers)
To view the Review answers, open this PDF file and look for section 11.1.