We use the **chi-square test **to examine patterns between categorical variables, such as genders, political candidates, locations, or preferences.

There are two types of chi-square tests: the **goodness-of-fit test **and the **test for independence**. We use the goodness-of-fit test to estimate how closely a sample matches the expected distribution. We use the test for independence to determine whether there is a significant association between two categorical variables in a single population.

To **test for significance**, it helps to make a table containing the observed and expected frequencies of the data sample. If you have *two* different categorical variables, this is called a **contingency table**.

**The Chi-Square Statistic**

The value that indicates the comparison between the observed and expected frequency is called the *chi-square statistic *. The idea is that if the observed frequency is close to the expected frequency, then the chi-square statistic will be small. On the other hand, if there is a substantial difference between the two frequencies, then we would expect the chi-square statistic to be large.

To calculate the chi-square statistic, \begin{align*}\chi^2\end{align*} , we use the following formula:

\begin{align*}\chi^2=\sum_{} \frac{(O_{}-E_{})^2}{E_{}}\end{align*}

where:

\begin{align*}\chi^2\end{align*} is the chi-square test statistic.

\begin{align*}O_{}\end{align*} is the observed frequency value for each event.

\begin{align*}E_{}\end{align*} is the expected frequency value for each event.

The number of **degrees of freedom*** *associated with a goodness-of-fit chi-square test is df = c - 1 where c is the number of categories. The number of degrees of freedom associated with a chi-square test of independence is, df = (r-1) * (c-1) where where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.

We use the chi-square test statistic and the degrees of freedom to determine the **p-value** on a chi-square probability table.

Using the **p-value **and the **level of significance**, we are able to determine whether to reject or fail to reject the null hypothesis and write a summary statement based on these results.

**Test of Single Variance**

We can use the chi-square test if we want to test two samples to determine if they belong to the same population. We are testing the hypothesis that the sample comes from a population with a variance greater than the observed variance.

Here is the formula for the chi-square statistic:

\begin{align*}\chi^2=\frac{df(s^2)}{\sigma^2}\end{align*}

where:

\begin{align*}\chi^2\end{align*} is the chi-square statistical value.

\begin{align*}df=n-1\end{align*} , where \begin{align*}n\end{align*} is the size of the sample.

\begin{align*}s^2\end{align*} is the sample variance.

\begin{align*}\sigma^2\end{align*} is the population variance.

Once we have the chi-square statistic, find the p-value and complete the test as usual.