12.3: The Kruskal-Wallis Test and the Runs Test
Learning Objectives
- Evaluate a hypothesis for several populations that are not normally distributed using multiple randomly selected independent samples with the Kruskal-Wallis Test.
- Determine the randomness of a sample using the runs test to access the number of data sequences and compute a test statistic using the appropriate formula.
Introduction
In the previous sections, we learned how to conduct nonparametric tests, including the sign test, the sign rank test, the rank sum test, and the rank correlation test. These tests allowed us to test hypotheses using data that did not meet the assumptions of being normally distributed or having homogeneity with respect to variance. In addition, each of these non-parametric tests had parametric counterparts.
In this last section, we will examine another nonparametric test\begin{align*}-\end{align*}
Evaluating Hypotheses Using the Kruskal-Wallis Test
The Kruskal-Wallis test is the analog of the one-way ANOVA and is used when our data set does not meet the assumptions of normality or homogeneity of variance. However, this test has its own requirements: it is essential that the data set has identically shaped and scaled distributions for each group.
As we learned in Chapter 11, when performing the one-way ANOVA test, we establish the null hypothesis that there is no difference between the means of the populations from which our samples were selected. However, we express the null hypothesis in more general terms when using the Kruskal-Wallis test. In this test, we state that there is no difference in the distributions of scores of the populations. Another way of stating this null hypothesis is that the average of the ranks of the random samples is expected to be the same.
The test statistic for this test is the non-parametric alternative to the \begin{align*}F\end{align*}
\begin{align*}H=\frac{12}{N(N+1)} \sum^m_{k=1} \frac{R^2_k}{n_k}-3(N+1)\end{align*}
where:
\begin{align*}N=\sum n_k\end{align*}.
\begin{align*}n_k\end{align*} is number of observations in the \begin{align*}k^{\text{th}}\end{align*} sample.
\begin{align*}R_k\end{align*} is the sum of the ranks in the \begin{align*}k^{\text{th}}\end{align*} sample.
\begin{align*}m\end{align*} is the number of samples.
Like most nonparametric tests, the Kruskal-Wallis test relies on the use of ranked data to calculate a test statistic. In this test, the measurement observations from all the samples are converted to their ranks in the overall data set. The smallest observation is assigned a rank of 1, the next smallest is assigned a rank of 2, and so on. Similar to this procedure in the rank sum test, if two observations have the same value, we assign both of them the same rank.
Once the observations in all of the samples are converted to ranks, we calculate the test statistic, \begin{align*}H\end{align*}, using the ranks and not the observations themselves. Similar to the other parametric and non-parametric tests, we use the test statistic to evaluate our hypothesis. For this test, the sampling distribution for \begin{align*}H\end{align*} is the chi-square distribution with \begin{align*}m-1\end{align*} degrees of freedom, where \begin{align*}m\end{align*} is the number of samples.
It is easy to use Microsoft Excel or a statistical programming package, such as SAS or SPSS, to calculate this test statistic and evaluate our hypothesis. However, for the purposes of this example, we will perform this test by hand.
Example: Suppose that a principal is interested in the differences among final exam scores from Mr. Red, Ms. White, and Mrs. Blue’s algebra classes. The principal takes random samples of students from each of these classes and records their final exam scores as shown:
Mr. Red | Ms. White | Mrs. Blue |
---|---|---|
52 | 66 | 63 |
46 | 49 | 65 |
62 | 64 | 58 |
48 | 53 | 70 |
57 | 68 | 71 |
54 | 73 |
Determine if there is a difference between the final exam scores of the three teachers.
Our hypothesis for the Kruskal-Wallis test is that there is no difference in the distributions of the scores of these three populations. Our alternative hypothesis is that at least two of the three populations differ. For this example, we will set our level of significance at \begin{align*}\alpha=0.05\end{align*}.
To test this hypothesis, we need to calculate our test statistic. To calculate this statistic, it is necessary to assign and sum the ranks for each of the scores in the table above as follows:
Mr. Red | Overall Rank | Ms. White | Overall Rank | Mrs. Blue | Overall Rank |
---|---|---|---|---|---|
52 | 4 | 66 | 13 | 63 | 10 |
46 | 1 | 49 | 3 | 65 | 12 |
62 | 9 | 64 | 11 | 58 | 8 |
48 | 2 | 53 | 5 | 70 | 15 |
57 | 7 | 68 | 14 | 71 | 16 |
54 | 6 | 73 | 17 | ||
Rank Sum | 29 | 46 | 78 |
Using this information, we can calculate our test statistic as shown:
\begin{align*}H=\frac{12}{N(N+1)} \sum^m_{k=1} \frac{R^2_k}{n_k}-3(N+1)=\frac{12}{(17)(18)} \left(\frac{29^2}{6}+\frac{46^2}{5}+\frac{78^2}{6}\right)-(3)(17+1)=7.86\end{align*}
Using the chi-square distribution, we determine that with \begin{align*}3-1=2\end{align*} degrees of freedom, our critical value at \begin{align*}\alpha=0.05\end{align*} is 5.991. Since our test statistic of 7.86 exceeds the critical value, we can reject the null hypothesis that stated there is no difference in the final exam scores among students from the three different classes.
Determining the Randomness of a Sample Using the Runs Test
The runs test (also known as the Wald-Wolfowitz test) is another nonparametric test that is used to test the hypothesis that the samples taken from a population are independent of one another. We also say that the runs test checks the randomness of data when we are working with two variables. A run is essentially a grouping or a pattern of observations. For example, the sequence \begin{align*}++--++--++--\end{align*} has six runs. Three of these runs are designated by two positive signs, and three of the runs are designated by two negative signs.
We often use the runs test in studies where measurements are made according to a ranking in either time or space. In these types of scenarios, one of the questions we are trying to answer is whether or not the average value of the measurement is different at different points in the sequence. For example, suppose that we are conducting a longitudinal study on the number of referrals that different teachers give throughout the year. After several months, we notice that the number of referrals appears to increase around the time that standardized tests are given. We could formally test this observation using the runs test.
Using the laws of probability, it is possible to estimate the number of runs that one would expect by chance, given the proportion of the population in each of the categories and the sample size. Since we are dealing with proportions and probabilities between discrete variables, we consider the binomial distribution as the foundation of this test. When conducting a runs test, we establish the null hypothesis that the data samples are independent of one another and are random. On the contrary, our alternative hypothesis states that the data samples are not random and/or not independent of one another.
The runs test can be used with either nominal or categorical data. When working with nominal data, the first step in conducting the test is to compute the mean of the data and then designate each observation as being either above the mean (i.e., \begin{align*}+\end{align*}) or below the mean (i.e., \begin{align*}-\end{align*}). Next, regardless of whether or not we are working with nominal or categorical data, we compute the number of runs within the data set. As mentioned, a run is a grouping of the variables. For example, in the following sequence, we would have 5 runs. We could also say that the sequence of the data switched five times.
\begin{align*}++ - - - - + + + - +\end{align*}
After determining the number of runs, we also need to record each time a certain variable occurs and the total number of observations. In the example above, we have 11 observations in total, with 6 positives \begin{align*}(n_1=6)\end{align*} and 5 negatives \begin{align*}(n_2=5)\end{align*}. With this information, we are able to calculate our test statistic using the following formulas:
\begin{align*}z &= \frac{\text{number of observed runs}-\mu}{\sigma}\\ \mu &= \text{expected number of runs}=1+\frac{2n_1n_2}{n_1+n_2}\\ \sigma^2 &= \text{variance of the number of runs}=\frac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)}\end{align*}
When conducting the runs test, we calculate the standard \begin{align*}z\end{align*}-score and evaluate our hypotheses, just like we do with other parametric and non-parametric tests.
Example: A teacher is interested in assessing if the seating arrangement of males and females in his classroom is random. He observes the seating pattern of his students and records the following sequence:
MFMMFFFFMMMFMFMMMMFFMFFMFFFF
Is the seating arrangement random? Use \begin{align*}\alpha=0.05\end{align*}.
To answer this question, we first generate the null hypothesis that the seating arrangement is random and independent. Our alternative hypothesis states that the seating arrangement is not random or independent. With \begin{align*}\alpha=0.05\end{align*}, we set our critical values at 1.96 standard scores above and below the mean.
To calculate the test statistic, we first record the number of runs and the number of each type of observation as shown:
\begin{align*}R=14 \quad M:n_1=13 \quad F:n_2=15\end{align*}
With these data, we can easily compute the test statistic as follows:
\begin{align*}\mu &= \text{expected number of runs}=1+\frac{(2)(13)(15)}{13+15}=1+\frac{390}{28}=14.9\\ \sigma^2 &= \text{variance of the number of runs}=\frac{(2)(13)(15)[(2)(13)(15)-13-15]}{(13+15)^2(13+15-1)}=\frac{(390)(362)}{(784)(27)}=6.67\\ \sigma &= 2.58\\ z &= \frac{\text{number of observed runs}-\mu}{\sigma}=\frac{14-14.9}{2.58}=-0.35\end{align*}
Since the calculated test statistic is not less than \begin{align*}z=-1.96\end{align*}, our critical value, we fail to reject the null hypothesis and conclude that the seating arrangement of males and females is random.
Lesson Summary
The Kruskal-Wallis test is used when we are assessing the one-way variance of a specific variable in non-normal distributions.
The test statistic for the Kruskal-Wallis test is the non-parametric alternative to the \begin{align*}F\end{align*}-statistic. This test statistic is defined by the following formula:
\begin{align*}H=\frac{12}{N(N+1)} \sum^m_{k=1} \frac{R^2_k}{n_k}-3(N+1)\end{align*}
The runs test (also known as the Wald-Wolfowitz test) is another non-parametric test that is used to test the hypothesis that the samples taken from a population are independent of one another. We use the \begin{align*}z\end{align*}-statistic to evaluate this hypothesis.
On the Web
http://tinyurl.com/334e5to Good explanations of and examples of different nonparametric tests.
http://tinyurl.com/33s4h3o Allows you to enter data and then performs the Wilcoxon sign rank test.
http://tinyurl.com/33s4h3o Allows you to enter data and performs the Mann Whitney Test.
Keywords
- Kruskal-Wallis test
- The Kruskal-Wallis test is the analog of the one-way ANOVA and is used when our data set does not meet the assumptions of normality or homogeneity of variance.
- Mann-Whitney \begin{align*}v-\end{align*}test
- This test is sensitive to both the median and the distribution of the sample and population.
- Non-parametric tests
- we use the test statistic to evaluate our hypothesis
- Rank correlation coefficient
- to measure the strength, magnitude, and direction of the relationship between two variables.
- Rank correlation test
- to measure the strength, magnitude, and direction of the relationship between two variables.
- Rank sum test
- We use the rank sum test (also known as the Mann-Whitney \begin{align*}v-\end{align*}test) to assess whether two samples come from the same distribution. This test is sensitive to both the median and the distribution of the samples.
- Run
- A run is essentially a grouping or a pattern of observations.
- Runs test
- The runs test (also known as the Wald-Wolfowitz test) is another nonparametric test that is used to test the hypothesis that the samples taken from a population are independent of one another.
- Sign rank test
- A more useful test that assesses the difference in size between the observations in a matched pair is the sign rank test.
- Sign test
- The sign test examines the difference in the medians of matched data sets.
- Spearman rank correlation coefficient
- Spearman rank correlation coefficient (also known simply as the rank correlation coefficient, \begin{align*}p\end{align*}, or ‘rho’) to measure the strength, magnitude, and direction of the relationship between two variables.
- \begin{align*}U-\end{align*}distribution
- the \begin{align*}U-\end{align*}distribution approaches the normal distribution as the sizes of both samples grow.
- \begin{align*}U-\end{align*}statistic
- we use the \begin{align*}U-\end{align*}statistic to calculate the standard \begin{align*}z-\end{align*}score.
- Wald-Wolfowitz test
- The runs test (also known as the Wald-Wolfowitz test) is another nonparametric test that is used to test the hypothesis that the samples taken from a population are independent of one another.
- Wilcoxon sign rank test
- The sign rank test (also known as the Wilcoxon sign rank test) resembles the sign test, but it is much more sensitive. Similar to the sign test, the sign rank test is also a nonparametric alternative to the paired Student’s \begin{align*}t-\end{align*}test.