Suppose you wanted to evaluate a recent statistic stating that iOS represents 32% and Android 51% of active smart phones. You would like to know if the statistic actually reflects the distribution of phones among your friends. How could you evaluate the data you collect to see if it supports this hypothesis?
Look to the end of the lesson for the answer.
ChiSquared Statistic
The Greek letter “chi”, written as
Conducting a ChiSquare test is much like conducting a Ztest or Ttest as we did in Chapter 10. We will follow the same basic series of steps and compare a calculated value to a chart to evaluate the probability of getting the results we have if the null hypothesis is true, just as we did with the Z and F tests. Additionally, as was the case with the Ftesting, we will be evaluating the number of degrees of freedom, and choosing values from a chart based on the number.
The primary difference between a ChiSquare test and the tests we have work with before is that previous tests have all been primarily dedicated to comparing single parameters, whereas ChiSquare tests are used to determine if two random variables are independent or related and so deal with multiple values for each variable. Additionally, the ChiSquare statistic is useful for looking at categorical data rather than quantitative data.
The ChiSquare statistic is actually pretty straightforward to calculate:
Determining the Validity of a Study
The American Pet Products Association conducted a survey in 2011 and determined that 60% of dog owners have only one dog, 28% have two dogs, and 12% have three or more. Supposing that you have decided to conduct your own survey and have collected the data below, determine whether your data supports the results of the APPA study. Use a significance level of 0.05.
Data: Out of 129 dog owners, 73 had one dog and 38 had two dogs.
 Step 1: Clearly state the null and alternative hypotheses
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
Create a table to organize data and compare the observed data to the expected data:
One Dog 
Two Dogs 
3+ Dogs 
TOTAL 

Observed 
73 
38 
18 
129 
Expected 
To identify the expected values, multiply the expected % by the total number observed:
One Dog 
Two Dogs 
3+ Dogs 
TOTAL 

Observed 




Expected 




To calculate our chisquare statistic, we need to sum the squared difference between each observed and expected value divided by the expected value:
Now that we have our chisquare statistic, we need to compare it to the chisquare value for the significance level 0.05. We can use a reference table such as the one below, or a chisquare value calculator. Just as with the Ttests in Chapter 10, we will need to know the degrees of freedom, which equal the number of observed category values minus one. In this case, there are three category values: one dog, two dogs, and three or more dogs. The degrees for freedom, therefore, are
Using the calculator or the table, we find that the critical value for a 0.05 significance level with
 Step 4: Interpret the results
Since our chisquare statistic was less than the critical value, we do not reject the null hypothesis, and we can say that our survey data does support the data from the APPA.
RealWorld Application: Car Insurance
Rachel told Eric that the reason her car insurance is less expensive is that female drivers get in fewer accidents than male drivers. Specifically, she says that male drivers are held responsible in 65% of accidents involving drivers under 23.
If Eric does some research of his own and discovers that 46 out of the 85 accidents he investigates involve male drivers, does his data support Rachel’s hypothesis?
 Step 1: Clearly state the null and alternative hypotheses
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
Create a table to organize data and compare the observed data to the expected data:
Male Drivers 
Female Drivers 
TOTAL 

Observed 
46 
39 
85 
Expected 
To identify the expected values, multiply the expected % by the total number observed:
Male Drivers 
Female Drivers 
TOTAL 

Observed 



Expected 



To calculate our chisquare statistic, we need to sum the squared differences between each observed and expected value divided by the expected value:
Now that we have our chisquare statistic, we need to compare it to the chisquare critical value for 0.05 with one degree of freedom, since we have two categories. Using the chisquare value calculator, we find the critical value to be 3.8414. The critical value indicates that only 0.05, or 5%, of values would be as high as 3.8414. If the
 Step 4: Interpret your results
Our calculated data value of
RealWorld Application: Car Magazine
The online car magazine “Camaro5.com” claims that 51% of Ford Mustang or Chevy Camaro owners own Camaros. Ellen is a Mustang lover and decides to do some research. If Ellen collects the data below, does her data support the magazine’s claim?
Data: Mustang owners: 28, Camaro owners: 34
 Step 1: Clearly state the null and alternative hypotheses
 Step 2: Identify an appropriate test and significance level
Since we are comparing two sets of data, and not just a single value, a ChiSquare test is appropriate. In the absence of a stated significance level in the problem, we assume the default 0.05.
 Step 3: Analyze sample data
We will start by creating a table to organize our data:
Mustang 
Camaro 
TOTAL 

Observed 



Expected 



Now we can calculate our chi statistic:
The chisquare critical value for
 Step 4: Interpret your results
Our calculated data value of
Earlier Problem Revisited
Suppose you wanted to evaluate a recent statistic stating that iOS represents 32% and Android 51% of active smart phones. You would like to know if the statistic actually reflects the distribution of phones among your friends. How could you evaluate the data you collect to see if it supports this hypothesis?
You could evaluate the hypothesis by collecting data from a SRS of cell phone owners and using a chisquare test to see if your data supports the hypothesis.
Examples
Examples 15 refer to the following data:
Tuscany claims that 70% of dog or cat owners own a dog, and 30% own a cat. Sayber decides to test her claim and learns that 23 of the 40 people he asks own dogs, and 17 own cats.
Example 1
What kind of test could you use to see if Sayber’s data supports Tuscany’s claim?
A chisquare test would be appropriate
Example 2
What would be the null and alternative hypotheses?
The null hypothesis,
Example 3
What would be the expected values of dog and cat owners?
The expected number of dog owners, according to Tuscany's claim, would be 70% of the 40 people that Sayber polled, or 28 dog owners. The expected number of cat owners would be 30% of the 40 people polled, or 12.
Example 4
What is the chisquare statistic of the observed data?
The
Example 5
Assuming a 0.1 significance level, does Sayber’s data support Tuscany’s claim?
The critical value of chisquared for 1 degree of freedom at a significance level of 0.1 is 2.705. Since the chisquare statistic we calculated is 2.9762, and is therefore more extreme than the critical value, we may reject the hypothesis, and say that Sayber’s data does not support Tuscany’s claim.
Review
Questions 15 refer to the following:
Evan claims that 15% of computer gamers have played “Team Fortress 2”, and 35% have played “World of Warcraft”. Evan’s brother is skeptical of those figures and decides to do some research. He discovers that 60 of the 200 computer gamers he polls have played “Team Fortress 2”, and 90 have played “World of Warcraft”.
1. Create a table to organize the data and prepare for hypothesis testing.
2. What sort of test would be appropriate to determine if the observed data supports Evan’s claim?
3. What would be
4. What would be the
5. How many degrees of freedom are there in the variable “played game”?
6. Assuming a significance level of 0.05, what is the
7. Does the observed data support Evan’s claim? Explain your findings.
Questions 815 refer to the following:
Mack claims that 84% of street racers drive import cars, and 16% drive domestic muscle cars. Abbi likes domestic cars and thinks Mack is overstating the percentage of imports, so she does some research of her own and finds that 57 of the street racers she interviewed drive imports, and 31 drive American muscle.
8. Create a table to organize the data and prepare for hypothesis testing.
9. What sort of test would be appropriate to determine if the observed data supports Mack’s claim?
10. What would be
11. What would be the
12. How many degrees of freedom are there in the variable “played game”?
13. Assuming a significance level of 0.10, what is the
14. Does the data indicate that Abbi should reject, or fail to reject
15. Interpret your results.
Review (Answers)
To view the Review answers, open this PDF file and look for section 11.5.