10.2: Test of Independence
Learning Objectives
 Understand how to draw data needed to perform calculations when running the chisquare test from contingency tables.
 Run the test of independence to determine whether two variables are independent or not.
 Use the test of homogeneity to examine the proportions of a variable attributed to different populations.
Introduction
As mentioned in the previous lesson, the chisquare test can be used to both estimate how closely an observed distribution matches an expected distribution (the goodnessoffit test) and to estimate whether two random variables are independent of one another (the test of independence). In this lesson, we will examine the test of independence in greater detail.
The chisquare test of independence is used to assess if two factors are related. This test is often used in social science research to determine if factors are independent of each other. For example, we would use this test to determine relationships between voting patterns and race, income and gender, and behavior and education.
In general, when running the test of independence, we ask, “Is Variable \begin{align*}X\end{align*}
Drawing Data from Contingency Tables Needed to Perform Calculations when Running a ChiSquare Test
Contingency tables can help us frame our hypotheses and solve problems. Often, we use contingency tables to list the variables and observational patterns that will help us to run a chisquare test. For example, we could use a contingency table to record the answers to phone surveys or observed behavioral patterns.
Example: We would use a contingency table to record the data when analyzing whether women are more likely to vote for a Republican or Democratic candidate when compared to men. In this example, we want to know if voting patterns are independent of gender. Hypothetical data for 76 females and 62 males from the state of California are in the contingency table below.
Democratic  Republican  Total  

Female  48  28  76 
Male  36  26  62 
Total  84  54  138 
Similar to the chisquare goodnessoffit test, the test of independence is a comparison of the differences between observed and expected values. However, in this test, we need to calculate the expected value using the row and column totals from the table. The expected value for each of the potential outcomes in the table can be calculated using the following formula:
\begin{align*}\text{Expected Frequency}=\frac{(\text{Row Total})(\text{Column Total})}{\text{Total Number of Observations}}\end{align*}
In the table above, we calculated the row totals to be 76 females and 62 males, while the column totals are 84 Democrats and 54 Republicans. Using the formula, we find the following expected frequencies for the potential outcomes:
The expected frequency for female Democratic outcome is \begin{align*}76 \bullet \frac{84}{138} = 46.26\end{align*}
The expected frequency for female Republican outcome is \begin{align*}76 \bullet \frac{54}{138} = 29.74\end{align*}
The expected frequency for male Democratic outcome is \begin{align*}62 \bullet \frac{84}{138} = 37.74\end{align*}
The expected frequency for male Republican outcome is \begin{align*}62 \bullet \frac{54}{138} = 24.26\end{align*}
Using these calculated expected frequencies, we can modify the table above to look something like this:
Democratic  Democratic  Republican  Republican  Total  

Observed  Expected  Observed  Expected  
Female  48  46.26  28  29.74  76 
Male  36  37.74  26  24.26  62 
Total  84  54  138 
With the figures above, we are able to calculate the chisquare statistic with relative ease.
The ChiSquare Test of Independence
When running the test of independence, we use similar steps as when running the goodnessoffit test described earlier. First, we need to establish a hypothesis based on our research question. Using our scenario of gender and voting patterns, our null hypothesis is that there is not a significant difference in the frequencies with which females vote for a Republican or Democratic candidate when compared to males. Therefore, our hypotheses can be stated as follows:
Null Hypothesis
\begin{align*}H_0:O=E\end{align*}
Alternative Hypothesis
\begin{align*}H_a:O \neq E\end{align*}
Using the table above, we can calculate the degrees of freedom and the chisquare statistic. The formula for calculating the chisquare statistic is the same as before:
\begin{align*}\chi^2=\sum_{} \frac{(O_{}E_{})^2}{E_{}}\end{align*}
where:
\begin{align*}\chi^2\end{align*}
\begin{align*}O_{}\end{align*}
\begin{align*}E_{}\end{align*}
Using this formula and the example above, we get the following expected frequencies and chisquare statistic:
Democratic  Democratic  Democratic  Republican  Republican  Republican  

Obs. Freq.  Exp. Freq. 
\begin{align*}\frac{(OE)^2}{E}\end{align*} 
Obs. Freq.  Exp. Freq. 
\begin{align*}\frac{(OE)^2}{E}\end{align*} 

Female  48  46.26  0.07  28  29.74  0.10 
Male  36  37.74  0.08  26  24.26  0.12 
Totals  84  54 
\begin{align*}\chi^2=0.07+0.08+0.10+0.12=0.37\end{align*}
Also, the degrees of freedom can be calculated from the number of Columns ("C") and the number of Rows ("R") as follows:
\begin{align*}df &= (C1)(R1)\\
&= (21)(21)=1\end{align*}
With an alpha level of 0.05, we look under the column for 0.05 and the row for degrees of freedom, which, again, is 1, in the standard chisquare distribution table (http://tinyurl.com/3ypvj2h). According to the table, we see that the critical value for chisquare is 3.841. Therefore, we would reject the null hypothesis if the chisquare statistic is greater than 3.841.
Since our calculated chisquare value of 0.37 is less than 3.841, we fail to reject the null hypothesis. Therefore, we can conclude that females are not significantly more likely to vote for a Republican or Democratic candidate than males. In other words, these two factors appear to be independent of one another.
On the Web
http://tinyurl.com/39lhc3y A chisquare applet demonstrating the test of independence.
Test of Homogeneity
The chisquare goodnessoffit test and the test of independence are two ways to examine the relationships between categorical variables. To determine whether or not the assignment of categorical variables is random (that is, to examine the randomness of a sample), we perform the test of homogeneity. In other words, the test of homogeneity tests whether samples from populations have the same proportion of observations with a common characteristic. For example, we found in our last test of independence that the factors of gender and voting patterns were independent of one another. However, our original question was if females were more likely to vote for a Republican or Democratic candidate when compared to males. We would use the test of homogeneity to examine the probability that choosing a Republican or Democratic candidate was the same for females and males.
Another commonly used example of the test of homogeneity is comparing dice to see if they all work the same way.
Example: The manager of a casino has two potentially loaded dice that he wants to examine. (Loaded dice are ones that are weighted on one side so that certain numbers have greater probabilities of showing up.) The manager rolls each of the dice exactly 20 times and comes up with the following results:
1  2  3  4  5  6  Totals  

Die 1  6  1  2  2  3  6  20 
Die 2  4  1  3  3  1  8  20 
Totals  10  2  5  5  4  14  40 
Like the other chisquare tests, we first need to establish a null hypothesis based on a research question. In this case, our research question would be something like, “Is the probability of rolling a specific number the same for Die 1 and Die 2?” This would give us the following hypotheses:
Null Hypothesis
\begin{align*}H_0:O=E\end{align*}
Alternative Hypothesis
\begin{align*}H_a:O \neq E\end{align*}
Similar to the test of independence, we need to calculate the expected frequency for each potential outcome and the total number of degrees of freedom. To get the expected frequency for each potential outcome, we use the same formula as we used for the test of independence, which is as follows:
\begin{align*}\text{Expected Frequency}=\frac{(\text{Row Total})(\text{Column Total})}{\text{Total Number of Observations}}\end{align*}
The following table includes the expected frequency (in parenthesis) for each outcome, along with the chisquare statistic, \begin{align*}\chi^2=\frac{(OE)^2}{E}\end{align*}
Number Rolled on the Potentially Loaded Dice
1 
\begin{align*}\chi^2\end{align*} 
2 
\begin{align*}\chi^2\end{align*} 
3 
\begin{align*}\chi^2\end{align*} 
4 
\begin{align*}\chi^2\end{align*} 
5 
\begin{align*}\chi^2\end{align*} 
6 
\begin{align*}\chi^2\end{align*} 
\begin{align*}\chi^2\end{align*} 


Die 1  6(5)  0.2  1(1)  0  2(2.5)  0.1  2(2.5)  0.1  3(2)  0.5  6(7)  0.14  1.04 
Die 2  4(5)  0.2  1(1)  0  3(2.5)  0.1  3(2.5)  0.1  1(2)  0.5  8(7)  0.14  1.04 
Totals  10  2  5  5  4  14  2.08 
\begin{align*}df &= (C1)(R1)\\
&= (61)(21)=5\end{align*}
From the table above, we can see that the value of the test statistic is 2.08.
Using an alpha level of 0.05, we look under the column for 0.05 and the row for degrees of freedom, which, again, is 5, in the standard chisquare distribution table. According to the table, we see that the critical value for chisquare is 11.070. Therefore, we would reject the null hypothesis if the chisquare statistic is greater than 11.070.
Since our calculated chisquare value of 2.08 is less than 11.070, we fail to reject the null hypothesis. Therefore, we can conclude that each number is just as likely to be rolled on one die as on the other. This means that if the dice are loaded, they are probably loaded in the same way or were made by the same manufacturer.
Lesson Summary
The chisquare test of independence is used to assess if two factors are related. It is commonly used in social science research to examine behaviors, preferences, measurements, etc.
As with the chisquare goodnessoffit test, contingency tables help capture and display relevant information. For each of the possible outcomes in the table constructed to run a chisquare test, we need to calculate the expected frequency. The formula used for this calculation is as follows:
\begin{align*}\text{Expected Frequency}=\frac{(\text{Row Total})(\text{Column Total})}{\text{Total Number of Observations}}\end{align*}
To calculate the chisquare statistic for the test of independence, we use the same formula as for the goodnessoffit test. If the calculated chisquare value is greater than the critical value, we reject the null hypothesis.
We perform the test of homogeneity to examine the randomness of a sample. The test of homogeneity tests whether various populations are homogeneous or equal with respect to certain characteristics.
Multimedia Links
For a discussion of the four different scenarios for use of the chisquare test (19.0), see American Public University, Test Requiring the ChiSquare Distribution (4:13).
For an example of a chisquare test for homogenity (19.0), see APUS07, Example of a ChiSquare Test of Homogenity (7:57).
For an example of a chisquare test for independence with the TI83/84 Calculator (19.0), see APUS07, Example of a ChiSquare Test of Independence Using a Calculator (3:29).
Review Questions
 What is the chisquare test of independence used for?
 True or False: In the test of independence, you can test if two variables are related, but you cannot test the nature of the relationship itself.
 When calculating the expected frequency for a possible outcome in a contingency table, you use the formula:

\begin{align*}\text{Expected Frequency} = \frac{(\text{Row Total})(\text{Column Total})}{\text{Total Number of Observations}}\end{align*}
Expected Frequency=(Row Total)(Column Total)Total Number of Observations 
\begin{align*}\text{Expected Frequency} = \frac{(\text{Total Observations})(\text{Column Total})}{\text{Row Total}}\end{align*}
Expected Frequency=(Total Observations)(Column Total)Row Total 
\begin{align*}\text{Expected Frequency} = \frac{(\text{Total Observations})(\text{Row Total})}{\text{Column Total}}\end{align*}
Expected Frequency=(Total Observations)(Row Total)Column Total

\begin{align*}\text{Expected Frequency} = \frac{(\text{Row Total})(\text{Column Total})}{\text{Total Number of Observations}}\end{align*}
 Use the table below to answer the following review questions.
Studied Abroad  Did Not Study Abroad  

Females  322  460 
Males  128  152 
(a) What is the total number of females in the sample?
450
280
612
782
(b) What is the total number of observations in the sample?
782
533
1,062
612
(c) What is the expected frequency for the number of males who did not study abroad?
161
208
111
129
(d) How many degrees of freedom are in this example?
1
2
3
4
(e) True or False: Our null hypothesis would be that females are as likely as males to study abroad.
(f) What is the chisquare statistic for this example?
1.60
2.45
3.32
3.98
 If the chisquare critical value at 0.05 and 1 degree of freedom is 3.81, and we have a calculated chisquare statistic of 2.22, we would:
 reject the null hypothesis
 fail to reject the null hypothesis
 True or False: We use the test of homogeneity to evaluate the equality of several samples of certain variables.
 The test of homogeneity is carried out the exact same way as:
 the goodnessoffit test
 the test of independence
Notes/Highlights Having trouble? Report an issue.
Color  Highlighted Text  Notes  

Please Sign In to create your own Highlights / Notes  
Show More 
Image Attributions
To add resources, you must be the owner of the section. Click Customize to make your own copy.