Objective
Here you will learn how to use a ChiSquared statistic to evaluate the fit of a hypothesized distribution. This is known as a Goodness of Fit test.
Concept
Suppose you wanted to evaluate a recent statistic stating that iOS represents 32% and Android 51% of active smart phones. You would like to know if the statistic actually reflects the distribution of phones among your friends. How could you evaluate the data you collect to see if it supports this hypothesis?
Look to the end of the lesson for the answer.
Watch This
http://youtu.be/b3o_hjWKgQw statslectures – ChiSquare Test for Goodness of Fit
Guidance
The Greek letter “chi”, written as , is the symbol used to identify a chisquare statistic , which we will use here to evaluate how well a set of observed data fits a corresponding expected set.
Conducting a ChiSquare test is much like conducting a Z test or T test as we did in Chapter 10. We will follow the same basic series of steps and compare a calculated value to a chart to evaluate the probability of getting the results we have if the null hypothesis is true, just as we did with the Z and F tests. Additionally, as was the case with the F testing, we will be evaluating the number of degrees of freedom , and choosing values from a chart based on the number.
The primary difference between a ChiSquare test and the tests we have work with before is that previous tests have all been primarily dedicated to comparing single parameters, whereas ChiSquare tests are used to determine if two random variables are independent or related and so deal with multiple values for each variable. Additionally, the ChiSquare statistic is useful for looking at categorical data rather than quantitative data.
The ChiSquare statistic is actually pretty straightforward to calculate:
Example A
The American Pet Products Association conducted a survey in 2011 and determined that 60% of dog owners have only one dog, 28% have two dogs, and 12% have three or more. Supposing that you have decided to conduct your own survey and have collected the data below, determine whether your data supports the results of the APPA study. Use a significance level of 0.05.
Data: Out of 129 dog owners, 73 had one dog and 38 had two dogs.
Solution:
 Step 1: Clearly state the null and alternative hypotheses
: The survey agrees with the sample .
: The survey does not agree with the sample .
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
Create a table to organize data and compare the observed data to the expected data:
One Dog 
Two Dogs 
3+ Dogs 
TOTAL 

Observed 
73 
38 
18 
129 
Expected 
To identify the expected values, multiply the expected % by the total number observed:
One Dog 
Two Dogs 
3+ Dogs 
TOTAL 

Observed 




Expected 




To calculate our chisquare statistic, we need to sum the squared difference between each observed and expected value divided by the expected value:
Now that we have our chisquare statistic, we need to compare it to the chisquare value for the significance level 0.05. We can use a reference table such as the one below, or a chisquare value calculator . Just as with the T tests in Chapter 10, we will need to know the degrees of freedom , which equal the number of observed category values minus one. In this case, there are three category values: one dog, two dogs, and three or more dogs. The degrees for freedom, therefore, are .
Using the calculator or the table, we find that the critical value for a 0.05 significance level with is 5.9915. That means that 95 times out of 100, a survey that agrees with a sample will have a critical value of 5.9915 or less. If our chisquare value is greater than 5.9915, then the measurements we took only occur 5 or fewer times out of 100, or the null hypothesis is incorrect. Our chisquare statistic is only 0.7533 , so we will not reject the null hypothesis.
 Step 4: Interpret the results
Since our chisquare statistic was less than the critical value, we do not reject the null hypothesis, and we can say that our survey data does support the data from the APPA.
Example B
Rachel told Eric that the reason her car insurance is less expensive is that female drivers get in fewer accidents than male drivers. Specifically, she says that male drivers are held responsible in 65% of accidents involving drivers under 23.
If Eric does some research of his own and discovers that 46 out of the 85 accidents he investigates involve male drivers, does his data support Rachel’s hypothesis?
Solution:
 Step 1: Clearly state the null and alternative hypotheses
: The survey agrees with the sample .
: The survey does not agree with the sample .
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
Create a table to organize data and compare the observed data to the expected data:
Male Drivers 
Female Drivers 
TOTAL 

Observed 
46 
39 
85 
Expected 
To identify the expected values, multiply the expected % by the total number observed:
Male Drivers 
Female Drivers 
TOTAL 

Observed 



Expected 



To calculate our chisquare statistic, we need to sum the squared differences between each observed and expected value divided by the expected value:
Now that we have our chisquare statistic, we need to compare it to the chisquare critical value for 0.05 with one degree of freedom , since we have two categories. Using the chisquare value calculator , we find the critical value to be 3.8414. The critical value indicates that only 0.05, or 5%, of values would be as high as 3.8414. If the of our data is greater than 3.8414, then fewer than 5 times out of 100 would we expect to get that result if the null hypothesis is true.
 Step 4: Interpret your results
Our calculated data value of is greater than the 0.05 significance level critical value of 3.8141, so we reject the null hypothesis. The data that Eric observed does not support the distribution that Rachel claimed.
Example C
The online car magazine “ Camaro5.com ” claims that 51% of Ford Mustang or Chevy Camaro owners own Camaros. Ellen is a Mustang lover and decides to do some research. If Ellen collects the data below, does her data support the magazine’s claim?
Data: Mustang owners: 28, Camaro owners: 34
Solution:
 Step 1: Clearly state the null and alternative hypotheses
: The survey agrees with the sample .
: The survey does not agree with the sample .
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
We will start by creating a table to organize our data:
Mustang 
Camaro 
TOTAL 

Observed 



Expected 



Now we can calculate our chi statistic:
The chisquare critical value for and a significance level of 0.05 is 3.8414 (the same as in Example B).
 Step 4: Interpret your results
Our calculated data value of is significantly less than the 0.05 significance level critical value of 3.8141, so we fail to reject the null hypothesis. This means that, unfortunately for Ellen, her research did not allow her to deny the claim that Camaros are more popular.
Concept Problem Revisited
Suppose you wanted to evaluate a recent statistic stating that iOS represents 32% and Android 51% of active smart phones. You would like to know if the statistic actually reflects the distribution of phones among your friends. How could you evaluate the data you collect to see if it supports this hypothesis?
You could evaluate the hypothesis by collecting data from a SRS of cell phone owners and using a chisquare test to see if your data supports the hypothesis.
Vocabulary
A chisquare statistic is a derived value used in a chisquare test to calculate the probability that a given distribution is a good fit for observed data.
The degrees of freedom of a variable are the number of values in the final calculation of a statistic that are free to vary. The degrees of freedom are calculated as , where is the number of samples or categories in the variable.
Guided Practice
Questions 15 refer to the following data:
Tuscany claims that 70% of dog or cat owners own a dog, and 30% own a cat. Sayber decides to test her claim and learns that 23 of the 40 people he asks own dogs, and 17 own cats.
 What kind of test could you use to see if Sayber’s data supports Tuscany’s claim?
 What would be the null and alternative hypotheses?
 What would be the expected values of dog and cat owners?
 What is the chisquare statistic of the observed data?
 Assuming a 0.1 significance level, does Sayber’s data support Tuscany’s claim?
Solutions :
1. A chisquare test would be appropriate.
2. The null hypothesis, , would be that the research does support the hypothesis, the alternative hypothesis would be that it does not.
3. The expected number of dog owners, according to Tuscany’s claim, would be 70% of the 40 people that Sayber polled, or 28 dog owners. The expected number of cat owners would be 30% of the 40 people polled, or 12.
4. The statistic is the sum of the squared differences between the observed and expected values, divided by the expected values:
5. The critical value of chisquared for 1 degree of freedom at a significance level of 0.1 is 2.705. Since the chisquare statistic we calculated is 2.9762, and is therefore more extreme than the critical value, we may reject the hypothesis , and say that Sayber’s data does not support Tuscany’s claim.
Practice
Questions 15 refer to the following:
Evan claims that 15% of computer gamers have played “Team Fortress 2”, and 35% have played “World of Warcraft”. Evan’s brother is skeptical of those figures and decides to do some research. He discovers that 60 of the 200 computer gamers he polls have played “Team Fortress 2”, and 90 have played “World of Warcraft”.
1. Create a table to organize the data and prepare for hypothesis testing.
2. What sort of test would be appropriate to determine if the observed data supports Evan’s claim?
3. What would be and ?
4. What would be the statistic for the observed data?
5. How many degrees of freedom are there in the variable “played game”?
6. Assuming a significance level of 0.05, what is the critical value?
7. Does the observed data support Evan’s claim? Explain your findings.
Questions 815 refer to the following:
Mack claims that 84% of street racers drive import cars, and 16% drive domestic muscle cars. Abbi likes domestic cars and thinks Mack is overstating the percentage of imports, so she does some research of her own and finds that 57 of the street racers she interviewed drive imports, and 31 drive American muscle.
8. Create a table to organize the data and prepare for hypothesis testing.
9. What sort of test would be appropriate to determine if the observed data supports Mack’s claim?
10. What would be and ?
11. What would be the statistic for the observed data?
12. How many degrees of freedom are there in the variable “played game”?
13. Assuming a significance level of 0.10, what is the critical value?
14. Does the data indicate that Abbi should reject, or fail to reject ?
15. Interpret your results.