10.3: Testing One Variance
Learning Objectives
- Test a hypothesis about a single variance using the chi-square distribution.
- Calculate a confidence interval for a population variance based on a sample standard deviation.
Introduction
In the previous lesson, we learned how the chi-square test can help us assess the relationships between two variables. In addition to assessing these relationships, the chi-square test can also help us test hypotheses surrounding variance, which is the measure of the variation, or scattering, of scores in a distribution. There are several different tests that we can use to assess the variance of a sample. The most common tests used to assess variance are the chi-square test for one variance, the \begin{align*}F\end{align*}
Testing a Single Variance Hypothesis Using the Chi-Square Test
Suppose that we want to test two samples to determine if they belong to the same population. The test of variance between samples is used quite frequently in the manufacturing of food, parts, and medications, since it is necessary for individual products of each of these types to be very similar in size and chemical make-up. This test is called the test for one variance.
To perform the test for one variance using the chi-square distribution, we need several pieces of information. First, as mentioned, we should check to make sure that the population has a normal distribution. Next, we need to determine the number of observations in the sample. The remaining pieces of information that we need are the standard deviation and the hypothetical population variance. For the purposes of this exercise, we will assume that we will be provided with the standard deviation and the population variance.
Using these key pieces of information, we use the following formula to calculate the chi-square value to test a hypothesis surrounding single variance:
\begin{align*}\chi^2=\frac{df(s^2)}{\sigma^2}\end{align*}
where:
\begin{align*}\chi^2\end{align*}
\begin{align*}df=n-1\end{align*}
\begin{align*}s^2\end{align*}
\begin{align*}\sigma^2\end{align*}
We want to test the hypothesis that the sample comes from a population with a variance greater than the observed variance. Let’s take a look at an example to help clarify.
Example: Suppose we have a sample of 41 female gymnasts from Mission High School. We want to know if their heights are truly a random sample of the general high school population with respect to variance. We know from a previous study that the standard deviation of the heights of high school women is 2.2.
To test this question, we first need to generate null and alternative hypotheses. Our null hypothesis states that the sample comes from a population that has a variance of less than or equal to 4.84 (\begin{align*}\sigma^2\end{align*}
Null Hypothesis
\begin{align*}H_0:\sigma^2 \le 4.84\end{align*}
Alternative Hypothesis
\begin{align*}H_a:\sigma^2 > 4.84\end{align*}
Using the sample of the 41 gymnasts, we compute the standard deviation and find it to be \begin{align*}s=1.2\end{align*}
\begin{align*}\chi^2=\frac{(40)(1.2^2)}{4.84}=11.9\end{align*}
Therefore, since 11.9 is less than 55.758 (the value from the chi-square table), we fail to reject the null hypothesis and, therefore, cannot conclude that the female gymnasts have a significantly higher variance in height than the general female high school population.
Calculating a Confidence Interval for a Population Variance
Once we know how to test a hypothesis about a single variance, calculating a confidence interval for a population variance is relatively easy. Again, it is important to remember that this test is dependent on the normality of the population. For non-normal populations, it is best to use the ANOVA test, which we will cover in greater detail in another lesson. To construct a confidence interval for the population variance, we need three pieces of information: the number of observations in the sample, the variance of the sample, and the desired confidence interval. With the desired confidence interval, \begin{align*}\alpha\end{align*}
Example: We randomly select 30 containers of Coca Cola and measure the amount of sugar in each container. Using the formula that we learned earlier, we calculate the variance of the sample to be 5.20. Find a 90% confidence interval for the true variance. In other words, assuming that the sample comes from a normal population, what is the range of the population variance?
To construct this 90% confidence interval, we first need to determine our upper and lower limits. The formula to construct this confidence interval and calculate the population variance, \begin{align*}\sigma^2\end{align*}
\begin{align*}\frac{dfs^2}{\chi^2_{0.05}} & \le \sigma^2 \le \frac{dfs^2}{\chi^2_{0.95}}\end{align*}
Using our standard chi-square distribution table (http://tinyurl.com/3ypvj2h), we can look up the critical \begin{align*}\chi^2\end{align*}
\begin{align*}\frac{dfs^2}{42.557} & \le \sigma^2 \le \frac{dfs^2}{17.708}\\
\frac{150.80}{42.557} & \le \sigma^2 \le \frac{150.80}{17.708}\\
3.54 & \le \sigma^2 \le 8.52\end{align*}
In other words, we are 90% confident that the variance of the population from which this sample was taken is between 3.54 and 8.52.
Lesson Summary
We can also use the chi-square distribution to test hypotheses about population variance. Variance is the measure of the variation, or scattering, of scores in a distribution, and we often use this test to assess the likelihood that a population variance is within a certain range.
To perform the test for one variance using the chi-square statistic, we use the following formula:
\begin{align*}\chi^2 = \frac{df(s^2)}{\sigma^2}\end{align*}
where:
\begin{align*}\chi^2\end{align*} is the Chi-Square statistical value.
\begin{align*}df=n-1\end{align*}, where \begin{align*}n\end{align*} is the size of the sample.
\begin{align*}s^2\end{align*} is the sample variance.
\begin{align*}\sigma^2\end{align*} is the population variance.
This formula gives us a chi-square statistic, which we can compare to values taken from the chi-square distribution table to test our hypothesis.
We can also construct a confidence interval, which is a range of values that includes the population variance with a given level of confidence. To find this interval, we use the formula shown below:
\begin{align*}\frac{dfs^2}{\chi^2_{\frac{\alpha}{2}}} & \le \sigma^2 \le \frac{dfs^2}{\chi^2_{1-\frac{\alpha}{2}}}\end{align*}
Review Questions
- We use the chi-square distribution for the:
- goodness-of-fit test
- test for independence
- testing of a hypothesis of single variance
- all of the above
- True or False: We can test a hypothesis about a single variance using the chi-square distribution for a non-normal population.
- In testing variance, our null hypothesis states that the two population means that we are testing are:
- equal with respect to variance
- not equal
- none of the above
- In the formula for calculating the chi-square statistic for single variance, \begin{align*}\sigma^2\end{align*} is:
- standard deviation
- number of observations
- hypothesized population variance
- chi-square statistic
- If we knew the number of observations in a sample, the standard deviation of the sample, and the hypothesized variance of the population, what additional information would we need to solve for the chi-square statistic?
- the chi-square distribution table
- the population size
- the standard deviation of the population
- no additional information is needed
- We want to test a hypothesis about a single variance using the chi-square distribution. We weighed 30 bars of Dial soap, and this sample had a standard deviation of 1.1. We want to test if this sample comes from the general factory, which we know from a previous study to have an overall variance of 3.22. What is our null hypothesis?
- Compute \begin{align*}\chi^2\end{align*} for Question 6.
- Given the information in Questions 6 and 7, would you reject or fail to reject the null hypothesis?
- Let’s assume that our population variance for this problem is unknown. We want to construct a 90% confidence interval around the population variance, \begin{align*}\sigma^2\end{align*}. If our critical values at a 90% confidence interval are 17.71 and 42.56, what is the range for \begin{align*}\sigma^2\end{align*}?
- What statement would you give surrounding this confidence interval?
Keywords
- ANOVA test
- For non-normal populations, it is best to use the ANOVA test, which we will cover in greater detail in another lesson.
- Chi-square distribution
- The chi-square distribution can be used to perform the goodness-of-fit test, which compares the observed values of a categorical variable with the expected values of that same variable.
- Chi-square statistic
- The value that indicates the comparison between the observed and expected frequency is called the chi-square statistic.
- Contingency table
- Contingency tables can help us frame our hypotheses and solve problems. Often, we use contingency tables to list the variables and observational patterns that will help us to run a chi-square test.
- Degrees of freedom
- the goodness-of-fit test is used to determine patterns of distinct categorical variables. The test requires that the data are obtained through a random sample. The number of degrees of freedom associated with a particular chi-square test is equal to the number of categories minus one.
- Goodness-of-fit test
- The chi-square test is used when estimating how closely a sample matches the expected distribution (also known as the goodness-of-fit test)
- Test for one variance
- The test of variance between samples is used quite frequently in the manufacturing of food, parts, and medications, since it is necessary for individual products of each of these types to be very similar in size and chemical make-up. This test is called the test for one variance.
- Test of homogeneity
- The chi-square goodness-of-fit test and the test of independence are two ways to examine the relationships between categorical variables. To determine whether or not the assignment of categorical variables is random (that is, to examine the randomness of a sample), we perform the test of homogeneity.
- Test of independence
- when estimating if two random variables are independent of one another (also known as the test of independence).