10.2: Test of Independence
Learning Objectives
 Understand how to draw and calculate appropriate data from tables needed to run a ChiSquare test.
 Run a Test of Independence to determine whether two variables are independent or not.
 Use a Test of Homogeneity to examine the proportions of a variable attributed to different populations.
Introduction
As mentioned in the previous lesson, the ChiSquare test can be used to (1) estimate how closely an observed distribution matches an expected distribution (GoodnessofFit test) or (2) estimating whether two random variables are independent of one another (the Test of Independence). In this lesson, we will examine the Test of Independence in greater detail.
The ChiSquare Test of Independence is used to assess if two factors are related. This test is often used in social science research to determine if factors are independent of each other. For example, we would use this test to determine relationships between voting patterns and race, income and gender, and behavior and education.
In general, when running the Test of Independence, we ask “Is Variable
Drawing and Calculating Data from Tables
As mentioned in the previous lesson, tables help us frame our hypotheses and solve problems. Often, we use tables to list the variables and observation patterns that will help us to run the ChiSquare test. For example, we could use a table to record the answers to phone surveys or observed behavior patterns.
Example: We would use a contingency table to record the data when analyzing whether women are more likely to vote for a Republican or Democratic candidate when compared to men. Specifically, we want to know if voting patterns are independent of gender. Hypothetical data for
Democratic  Republican  Total  

Female 



Male 



Total 



Similar to the ChiSquare GoodnessofFit test, the ChiSquare Test of Independence is a comparison of the difference between the observed and expected values. However, in this test we need to calculate the expected value using the row and column totals from the table. The expected value for each cell of the table can be calculated using the formula:
In the table above, we calculated that the Row Totals are
Expected Frequency for Female Democratic cell is
Expected Frequency for Female Republican cell is
Expected Frequency for Male Democratic cell is
Expected Frequency for Male Republican cell is
Using these calculated expected frequencies, we can modify the table above to look something like this:
Democratic  Democratic  Republican  Republican  Total  

Observed  Expected  Observed  Expected  
Female 





Male 





Total 



Using these figures above, we are able to calculate the ChiSquare statistic with relative ease.
The ChiSquare Test of Independence
As with the GoodnessofFit test described earlier, we use similar steps when running a TestofIndependence. First, we need to establish a hypothesis based on our research question. Using our scenario of gender and voting patterns, our null hypothesis is that there is not a significant difference in the frequencies with which females vote for a Republican or Democratic candidate when compared with males. Therefore,
Null Hypothesis
Alternative Hypothesis
Using the table above, we can calculate the Degrees of Freedom and the ChiSquare statistic. The formula for calculating the ChiSquare statistic is the same as before:
where:
Using this formula and the example above, we get the following expected frequencies and ChiSquare calculations.
Democratic candidate  Democratic candidate  Democratic candidate  Republican candidate  Republican candidate  Republican Candidate  

Obs. Freq.  Exp. Freq. 

Obs. Freq.  Exp. Freq. 


Female 






Male 

\begin{align*}37.74\end{align*}  \begin{align*}.08\end{align*}  \begin{align*}26\end{align*}  \begin{align*}24.26\end{align*}  \begin{align*}.12\end{align*} 
Totals  \begin{align*}84\end{align*}  \begin{align*}54\end{align*} 
\begin{align*}\text{and the Degrees of Freedom} & = (C  1) (R  1) \\ \text{df} & = (21) (21) = 1\end{align*}
Using the table and formula above, we see that the ChiSquare statistic is equal to the sum of all of these values for \begin{align*}(OE)^2 /E\end{align*}.Therefore,
\begin{align*} x^2 = .07 + .08 + .10 + .12 = 0.37\end{align*}
Using an alpha level of .05, we look under the column for \begin{align*}.05\end{align*} and the row for Degrees of Freedom \begin{align*}(df = 1)\end{align*}. Using the standard ChiSquare distribution table, we see that the critical value for ChiSquare is \begin{align*}3.84\end{align*}.Therefore we would reject the null hypothesis if the ChiSquare statistic is greater than \begin{align*}3.84\end{align*}.
Reject \begin{align*}H_0:O\end{align*} if \begin{align*}X^2 > 3.84\end{align*}
Since our calculated ChiSquare value of \begin{align*}0.37\end{align*} is not greater than \begin{align*}3.84\end{align*}, we fail to reject the null hypothesis. Therefore, we can conclude that females are not significantly more likely to vote for democratic candidates than males. In other words, these two factors appear to be independent of one another.
Test of Homogeneity
The ChiSquare GoodnessofFit and Test of Independence are two ways to examine the relationships between categorical variables. But what test do we use if we are interested in testing whether or not the assignments of these categorical variables are random? We perform the Test of Homogeneity, which is computed the same way as the Test of Independence, to examine the randomness of a sample. In other words, the Test of Homogeneity tests whether samples from populations have the same proportion of observations with a common characteristic.
The Test of Homogeneity is used when we examine the probability that the assignment of one variable is equal to another. For example, we found in our last Test of Independence that the factors of gender and voting patterns were independent of one another. However, remember that our original question was if females were more likely to vote for Democratic candidates when compared to males. We would use the Test of Homogeneity to examine the probability that choosing a Democratic candidate was the same for females and males.
Another commonly used example of a Test of Homogeneity is comparing dice to see if they all work the same way. Let’s use that example to conduct a sample Test of Homogeneity.
Example: A manager of a casino has two potentially ‘loaded’ (‘loaded dice’ are ones that are weighted on one side so that certain numbers have greater probabilities of showing up) that they want to examine. The manager rolls each of the dice exactly \begin{align*}20 \;\mathrm{times}\end{align*} and comes up with the following results.
1  2  3  4  5  6  Totals  

Dice 1  \begin{align*}6\end{align*}  \begin{align*}1\end{align*}  \begin{align*}2\end{align*}  \begin{align*}2\end{align*}  \begin{align*}3\end{align*}  \begin{align*}6\end{align*}  \begin{align*}20\end{align*} 
Dice 2  \begin{align*}4\end{align*}  \begin{align*}1\end{align*}  \begin{align*}3\end{align*}  \begin{align*}3\end{align*}  \begin{align*}1\end{align*}  \begin{align*}8\end{align*}  \begin{align*}20\end{align*} 
Totals  \begin{align*}10\end{align*}  \begin{align*}2\end{align*}  \begin{align*}5\end{align*}  \begin{align*}5\end{align*}  \begin{align*}4\end{align*}  \begin{align*}14\end{align*}  \begin{align*}40\end{align*} 
Like the other ChiSquare tests, we first need to establish a hypothesis based on a research question. In this case, our research question would look something like: “Is the probability of rolling a specific number the same for Dice \begin{align*}1\end{align*} and Dice \begin{align*}2\end{align*}?” This would give us the following hypotheses:
Null Hypothesis \begin{align*}(H_0:O) = E\end{align*} (The probabilities are the same for both die)
Alternative Hypothesis \begin{align*}(H_a:O)\neq E\end{align*} (The probabilities differ for both die)
Similar to the other test, we need to calculate the expected values for each cell and the total number of Degrees of Freedom. To get the expected frequency for each cell, we use the same formula as we used for the Test of Independence:
\begin{align*}\text{Expected\ Frequency} = \frac{(\text{Row\ Total}) (\text{Column\ Total})} {\text{Total\ Number\ of\ Observations}}\end{align*}
The following table has includes the Expected Frequency (in parenthesis) for each cell along with the ChiSquare statistic \begin{align*}((OE)^2 / E)\end{align*} in a separate column.
Number Rolled on the Potentially Loaded Dice
\begin{align*}1\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}2\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}3\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}4\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}5\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}6\end{align*}  \begin{align*}X^2\end{align*}  \begin{align*}X^2\end{align*} Total  

Dice \begin{align*}1\end{align*}  \begin{align*}6 (7.5)\end{align*}  \begin{align*}0.3\end{align*}  \begin{align*}1 (1)\end{align*}  \begin{align*}0\end{align*}  \begin{align*}2 (2.5)\end{align*}  \begin{align*}.1\end{align*}  \begin{align*}2 (2.5)\end{align*}  \begin{align*}.1\end{align*}  \begin{align*}3 (2)\end{align*}  \begin{align*}.5\end{align*}  \begin{align*}6 (7)\end{align*}  \begin{align*}.2\end{align*}  \begin{align*}1.2\end{align*} 
Dice \begin{align*}2\end{align*}  \begin{align*}4 (7.5)\end{align*}  \begin{align*}1.6\end{align*}  \begin{align*}1 (1)\end{align*}  \begin{align*}0\end{align*}  \begin{align*}3 (2.5)\end{align*}  \begin{align*}.1\end{align*}  \begin{align*}3 (2.5)\end{align*}  \begin{align*}.1\end{align*}  \begin{align*}1 (2)\end{align*}  \begin{align*}.5\end{align*}  \begin{align*}8 (7)\end{align*}  \begin{align*}.2\end{align*}  \begin{align*}2.5\end{align*} 
Totals  \begin{align*}10\end{align*}  \begin{align*}2\end{align*}  \begin{align*}5\end{align*}  \begin{align*}5\end{align*}  \begin{align*}4\end{align*}  \begin{align*}14\end{align*} 
\begin{align*}\text{and the Degrees of Freedom} & = (C  1) (R  1) \\ \text{df} & = (6  1) (21) = 5\end{align*}
Using the same ChiSquare formula and the information from the table above, we find that:
\begin{align*}X^2 = .1.2 + 2.5 = 3.7\end{align*}
Using an alpha level of .05,we look under the column for \begin{align*}.05\end{align*} and the row for Degrees of Freedom \begin{align*}(df = 5)\end{align*}. Using the standard ChiSquare distribution table, we see that the critical value for ChiSquare is \begin{align*}11.07\end{align*}. Therefore we would reject the null hypothesis if the ChiSquare statistic is greater than \begin{align*}11.07\end{align*}.
Reject\begin{align*}(H_0:O)\end{align*} if \begin{align*}X^2 > 11.07\end{align*}
Since our calculated ChiSquare value of \begin{align*}3.7\end{align*} is not greater than \begin{align*}11.07\end{align*}, we fail to reject the null hypothesis. Therefore, we can conclude that each number is just as likely to be rolled on one die as the other. This means that if the dice are loaded, they are probably loaded in the same way or were made by the same manufacturer.
Lesson Summary
1. The ChiSquare Test of Independence is used to assess if \begin{align*}2\end{align*} factors are related. It is commonly used in social science research to examine behaviors, preferences, measurements, etc.
2. As with the ChiSquare GoodnessifFit test, tables help capture and display relevant information.
3. For each cell in the table constructed to run a chisquare test, we need to calculate the expected frequency. The formula used for this calculation is:
\begin{align*}\text{Expected Frequency} = \frac{(\text{Row\ Total}) (\text{Column\ Total})} {\text{Total\ Number\ of\ Observations}}\end{align*}
4. To calculate the ChiSquare statistic for the Test of Independence, we use the same formula as the GoodnessofFit test. If the calculated ChiSquare value is greater than the critical value, we reject the null hypothesis.
5. We perform the Test of Homogeneity to examine the randomness of a sample. The Test of Homogeneity tests whether various populations are homogeneous or equal with respect to certain characteristics.
Review Questions
 What is the ChiSquare Test of Independence used for?
 True or False: In the Test of Independence, you can test if two variables are related but you cannot test the nature of the relationship itself.
 When calculating the expected frequency for a cell in a contingency table, you use the formula:
 \begin{align*}\text{Expected Frequency } = \frac{(\text{Row Total}) (\text {Column Total})} {\text{Total Number of Observations}}\end{align*}
 \begin{align*}\text{Expected Frequency } = \frac{(\text{Total Observations}) (\text{Column Total})} {\text{Row Total}} \end{align*}
 \begin{align*}\text{Expected Frequency } = \frac{(\text{Total Observations}) (\text{Row Total})} {\text{Column Total}}\end{align*}
Please use the table below to answer the following review questions.
Studied Abroad  Did Not Study Abroad  

Females  \begin{align*}322\end{align*}  \begin{align*}460\end{align*} 
Males  \begin{align*}128\end{align*}  \begin{align*}152\end{align*} 
 What is the total number of females in the sample?
 \begin{align*}450\end{align*}
 \begin{align*}280\end{align*}
 \begin{align*}612\end{align*}
 \begin{align*}782\end{align*}
 What is the total number of observations in this sample?
 \begin{align*}782\end{align*}
 \begin{align*}533\end{align*}
 \begin{align*}1,062\end{align*}
 \begin{align*}612\end{align*}
 What is the expected frequency for the number of males that did not study abroad?
 \begin{align*}161\end{align*}
 \begin{align*}208\end{align*}
 \begin{align*}111\end{align*}
 \begin{align*}129\end{align*}
 How many Degrees of Freedom are in this example?
 \begin{align*}1\end{align*}
 \begin{align*}2\end{align*}
 \begin{align*}3\end{align*}
 \begin{align*}4\end{align*}
 True or False: Our null hypothesis would be that females are as likely as males to study abroad.
 What is the ChiSquare statistic for this example?
 \begin{align*}1.60\end{align*}
 \begin{align*}2.45\end{align*}
 \begin{align*}3.32\end{align*}
 \begin{align*}3.98\end{align*}
 If the ChiSquare critical value at \begin{align*}.05\end{align*} and \begin{align*}1 \;\mathrm{degree}\end{align*} of freedom is \begin{align*}3.81\end{align*} and we have a calculated ChiSquare value of \begin{align*}2.22\end{align*}, we would:
 reject the null hypothesis
 fail to reject the null hypothesis
 True or False: We use the Test of Homogeneity to evaluate the equality of several samples of certain variables.
 The Test of Homogeneity is carried out the exact same way as:
 The GoodnessofFit test
 The Test of Independence
Review Answers
 To examine if two variables are related.
 True
 A
 D
 C
 A
 A
 True
 A
 B
 True
 B