10.1: The GoodnessofFit Test
Learning Objectives
 Understand the difference between the ChiSquare distribution and the Student’s tdistribution.
 Identify the conditions which must be satisfied when using the ChiSquare test.
 Understand the features of experiments that allow GoodnessofFit tests to be used.
 Evaluate an hypothesis using the GoodnessofFit test.
Introduction
In previous lessons, we learned that there are several different tests that we can use to analyze data and test hypotheses. The type of test that we choose depends on the data available and what question we are trying to answer. For example:
 We analyze simple descriptive statistics such as the mean, median, mode and standard deviation to give us an idea of the distribution and to remove outliers, if necessary;
 We calculate probabilities to determine the likelihood of something happening; and
 We use regression analysis to examine the relationship between two or more continuous variables.
But what test do we run if we are trying to examine patterns between distinct categories such as gender, political candidates, locations or preferences? To analyze patterns like these we use the ChiSquare test.
The ChiSquare test is a statistical test used to examine patterns in distinct or categorical variables, which we learned about in the earlier chapter entitled Planning and Conducting an Experiment or Study. This test is used in:
 1. Estimating how closely a sample matches the expected distribution (also known as the GoodnessofFit test) and
 2. Estimating if two random variables are independent of one another (also known as the Test of Independence  see Chapter 9).
In this lesson we will learn more about the GoodnessofFit test and how to create and evaluate hypotheses using this test.
The ChiSquare Distribution
The ChiSquare GoodnessofFit test is used to compare the observed values of a categorical variable with the expected values of that same variable. For example, we would use this test to analyze surveys that contained categorical variables (for example, gender, city of origin, or locations that people preferred to visit on vacation) to determine if there are in fact relationships between certain items.
Example: We would use the ChiSquare GoodnessofFit test to evaluate if there was a preference in the types of lunch that
Research Question: Do
Using a sample of
Type of Lunch  Observed Frequency  Expected Frequency 

Salad 


Sub Sandwich 


Daily Special 


Brought Own Lunch 


If there is no difference in which type of lunch is preferred, we would expect the students to prefer each type of lunch equally. To calculate the expected frequency of each category as if school lunch preferences were distributed equally, we divide the number of observations by the number of categories. Since there are
The value that indicates the comparison between the observed and expected frequency is called the ChiSquare statistic. The idea is that if the observed frequency is close to the expected frequency, then the ChiSquare statistic will be small. Or, if the difference between the two frequencies is big, then we expect the ChiSquare statistic to be large.
To calculate the ChiSquare statistic
Once calculated, we take this ChiSquare value along with the degrees of freedom (this will be discussed later) and look up the ChiSquare value on a standard ChiSquare distribution table. The ChiSquare distribution allows us to determine the probability that a sample fits an expected pattern. In contrast, the tdistribution tests how likely it is that the means of two different samples will differ. Please see the table below for more details.
Type of Distribution  Tells Us  Every Day Example  Data Needed to Determine Value 

ChiSquare  The relationship between two or more categorical variables.  Analyzing survey data with categorical variables.  Observed and expected frequencies for categorical variables, degrees of freedom. 
Student’s tTest  The differences between the means of two groups with respect to a continuous variable.  Determining if there is a difference in the mean of the SAT scores between schools.  The mean values for samples from two populations, degrees of freedom. 
Features of the GoodnessofFit Test
As mentioned, the GoodnessofFit test is used to determine patterns of distinct or categorical variables. As we learned in Lesson 6, a categorical variable is one that is not continuous and has observations in separate categories. Examples of categorical variables include:
gender (male or female)
preferences (agreed, neutral or disagreed)
behaviors (got sent to the office or didn’t get sent to the office)
physical traits (straight, wavy or curly hair)
Categorical variables are not the same as measurement or continuous variables. The following are normally not categorical variables:
It is important to note that most of these continuous variables could in fact be converted to a categorical variable. For example, you could create a categorical variable with two values such as ¨Less that
In addition to categorical variables, a GoodnessofFit test also requires:
data obtained through a random sample
a calculation of the ChiSquare statistic using the formula explained in the last section
the calculation of the Degrees of Freedom. For a ChiSquare test, the Degrees of Freedom are equal to the number of categories minus one or
Using our example about the preferences of types of school lunches, we calculate the
There are many situations that use the GoodnessofFit test, including surveys, taste tests and analysis of behaviors. Interestingly, GoodnessofFit tests are also used in casinos to determine if there is cheating in games of chance such as cards and dice. For example, if a certain card or number on a die shows up more than expected (a high observed frequency compared to the expected frequency), officials use the GoodnessofFit test to determine the likelihood that the player may be cheating or the game may not be fair.
Evaluating Hypothesis Using the GoodnessofFit Test
Let’s use our original example to create and test a hypothesis using the GoodnessofFit ChiSquare test. First, we will need to state the null and alternative hypotheses for our research question. Since our research question states “Do
Null Hypothesis
Alternative Hypothesis
Using an alpha level of
Reject
Using the table from above, we can calculate the ChiSquare statistic with relative ease.
Type of Lunch  Observed Frequency  Expected Frequency 


Salad 



Sub Sandwich 



Daily Special 



Brought Own Lunch 



Total (chisquare) 

Since our ChiSquare statistic of
As review, we follow the following steps to formulate and evaluate hypothesis:
 State the null and alternative hypothesis for the research question.
 Select the significance level and use the ChiSquare distribution table to write a rule for rejecting the null hypothesis.
 Calculate the value of the ChiSquare statistic.
 Determine whether to reject or fail to reject the null hypothesis and write a summary statement based on the results.
Lesson Summary
1. We use the ChiSquare test to examine patterns between categorical variables such as gender, political candidates, locations or preferences.
2. There are two types of ChiSquare tests: the GoodnessofFit test and the Test for Independence. We use the GoodnessofFit test to estimate how closely a sample matches the expected distribution.
3. To test for significance, it helps to make a table detailing the observed and expected frequencies of the data sample. Using the standard ChiSquare distribution table, we are able to create criteria for accepting the null or alternative hypotheses for our research questions.
4. To test the null hypothesis it is necessary to calculate the ChiSquare statistic. To calculate the ChiSquare statistic
where:
5.Using the ChiSquare statistic and the level of significance, we are able to determine whether to reject or fail to reject the null hypothesis and write a summary statement based on these results.
Supplemental Links
Distribution Tables (including the Student’s tdistribution and ChiSquare distribution)
http://www.statsoft.com/textbook/stathome.html?sttable.html&1
Review Questions
 What is the name of the statistical test used analyze the patterns between two categorical variables?
 the Student’s ttest
 the ANOVA test
 the ChiSquare test
 the zscore
 There are two types of ChiSquare tests. Which type of ChiSquare test estimates how closely a sample matches an expected distribution?
 the GoodnessofFit test
 the Test for Independence
 Which of the following is considered a categorical variable:
 income
 gender
 height
 weight
 If there were
250 observations in a data set and2 uniformly distributed categories that were being measured, the expected frequency for each category would be:
125 
500 
250 
5

 What is the formula for calculating the ChiSquare statistic? The principal is planning a field trip. She samples a group of
100 students to see if they prefer a sporting event, a play at the local college or a science museum. She records the following results:
Type of Field Trip  Number Preferring 

Sporting Event 

Play 

Science Museum 

 What is the observed frequency value for the Science Museum category?
 What is the expected frequency value for the Sporting Event category?
 What would be the null hypothesis for the situation above?
 There is no preference between the types of field trips that students prefer
 There is a preference between the types of field trips that students prefer
 What would be the ChiSquare statistic for the research question above?
 If the estimated ChiSquare level of significance was
5.99 , would you reject or fail to reject the null hypothesis?
Review Answers
 C
 A
 B
 A

X2=∑(0−E)2E 
29 
33.33  A

20.0 (see table below)
Type of Field Trip  Observed Frequency  Expected Frequency  ChiSquare 

Sporting Event 



Play 



Science Museum 



ChiSquare Total 

 Reject the Null Hypothesis
Notes/Highlights Having trouble? Report an issue.
Color  Highlighted Text  Notes  

Please Sign In to create your own Highlights / Notes  
Show More 