We use the chi-square test to examine patterns between categorical variables, such as genders, political candidates, locations, or preferences.
There are two types of chi-square tests: the goodness-of-fit test and the test for independence. We use the goodness-of-fit test to estimate how closely a sample matches the expected distribution. We use the test for independence to determine whether there is a significant association between two categorical variables in a single population.
To test for significance, it helps to make a table containing the observed and expected frequencies of the data sample. If you have two different categorical variables, this is called a contingency table.
The Chi-Square Statistic
The value that indicates the comparison between the observed and expected frequency is called the chi-square statistic . The idea is that if the observed frequency is close to the expected frequency, then the chi-square statistic will be small. On the other hand, if there is a substantial difference between the two frequencies, then we would expect the chi-square statistic to be large.
To calculate the chi-square statistic, , we use the following formula:
is the chi-square test statistic.
is the observed frequency value for each event.
is the expected frequency value for each event.
The number of degrees of freedom associated with a goodness-of-fit chi-square test is df = c - 1 where c is the number of categories. The number of degrees of freedom associated with a chi-square test of independence is, df = (r-1) * (c-1) where where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.
We use the chi-square test statistic and the degrees of freedom to determine the p-value on a chi-square probability table.
Using the p-value and the level of significance, we are able to determine whether to reject or fail to reject the null hypothesis and write a summary statement based on these results.
Test of Single Variance
We can use the chi-square test if we want to test two samples to determine if they belong to the same population. We are testing the hypothesis that the sample comes from a population with a variance greater than the observed variance.
Here is the formula for the chi-square statistic:
is the chi-square statistical value.
, where is the size of the sample.
is the sample variance.
is the population variance.
Once we have the chi-square statistic, find the p-value and complete the test as usual.