<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# Chi-Square Test

## Closeness of observed data to expected data of the model

Estimated10 minsto complete
%
Progress
Practice Chi-Square Test
Progress
Estimated10 minsto complete
%
Chi-Square Test

We use the chi-square test to examine patterns between categorical variables, such as genders, political candidates, locations, or preferences.

There are two types of chi-square tests: the goodness-of-fit test and the test for independence. We use the goodness-of-fit test to estimate how closely a sample matches the expected distribution.  We use the test for independence to determine whether there is a significant association between two categorical variables in a single population.

To test for significance, it helps to make a table containing the observed and expected frequencies of the data sample. If you have two different categorical variables, this is called a contingency table.

The Chi-Square Statistic

The value that indicates the comparison between the observed and expected frequency is called the chi-square statistic . The idea is that if the observed frequency is close to the expected frequency, then the chi-square statistic will be small. On the other hand, if there is a substantial difference between the two frequencies, then we would expect the chi-square statistic to be large.

To calculate the chi-square statistic, χ2\begin{align*}\chi^2\end{align*} , we use the following formula:

χ2=(OE)2E\begin{align*}\chi^2=\sum_{} \frac{(O_{}-E_{})^2}{E_{}}\end{align*}

where:

χ2\begin{align*}\chi^2\end{align*} is the chi-square test statistic.

O\begin{align*}O_{}\end{align*} is the observed frequency value for each event.

E\begin{align*}E_{}\end{align*} is the expected frequency value for each event.

The number of degrees of freedom associated with a goodness-of-fit chi-square test is df = c - 1 where c is the number of categories.  The number of degrees of freedom associated with a chi-square test of independence is, df = (r-1) * (c-1) where where r is the number of levels for one catagorical variable, and c is the number of levels for the other categorical variable.

We use the chi-square test statistic and the degrees of freedom to determine the p-value on a chi-square probability table.

Using the p-value and the level of significance, we are able to determine whether to reject or fail to reject the null hypothesis and write a summary statement based on these results.

Test of Single Variance

We can use the chi-square test if we want to test two samples to determine if they belong to the same population.  We are testing the hypothesis that the sample comes from a population with a variance greater than the observed variance.

Here is the formula for the chi-square statistic:

χ2=df(s2)σ2\begin{align*}\chi^2=\frac{df(s^2)}{\sigma^2}\end{align*}

where:

χ2\begin{align*}\chi^2\end{align*} is the chi-square statistical value.

df=n1\begin{align*}df=n-1\end{align*} , where n\begin{align*}n\end{align*} is the size of the sample.

s2\begin{align*}s^2\end{align*} is the sample variance.

σ2\begin{align*}\sigma^2\end{align*} is the population variance.

Once we have the chi-square statistic, find the p-value and complete the test as usual.

### My Notes/Highlights Having trouble? Report an issue.

Color Highlighted Text Notes