12.3: The Kruskal-Wallis Test and the Runs Test
Learning Objectives
- Evaluate a hypothesis for several populations that are not normally distributed using multiple randomly selected independent samples using the Kruskal-Wallis Test.
- Determine the randomness of a sample using the Runs Test to access the number of data sequences and compute a test statistic using the appropriate formula.
Introduction
In the previous sections we learned how to conduct nonparametric tests including the sign test, the sign rank test, the rank sum test and the rank correlation test. These tests allowed us to test hypotheses using data that did not meet the assumptions of being normally distributed or homogeneous with respect to variance. In addition, each of these non-parametric tests had parametric counterparts.
In this last section we will examine another nonparametric test – the Kruskal-Wallis one-way analysis of variance (also known simply as the Kruskal-Wallis test). This test is similar to the ANOVA test and the calculation of the test statistic is similar to that of the rank sum test. In addition, we will also explore something known as the runs test which can be used to help decide if sequences observed within a data set are random.
Evaluating Hypotheses Using the Kruskal-Wallis Test
The Kruskal-Wallis test is the analog of the one-way ANOVA and is used when our data does not meet the assumptions of normality or homogeneity of variance. However, this test has its own requirements: it is essential that the data has identically shaped and scaled distributions for each group.
As we learned in Chapter 11, when performing the one-way ANOVA test we establish the null hypothesis that there is no difference between the means of the populations from which our samples were selected. However, we express the null hypothesis in more general terms when using the Kruskal-Wallis test. In this test, we state that there is no difference in the distribution of scores of the populations. Another way of stating this null hypothesis is that the average of the ranks of the random samples is expected to be the same.
The test statistic for this test \begin{align*}(H)\end{align*} is the non-parametric alternative to the \begin{align*}F\end{align*}-statistic. This test statistic is defined by the formula:
\begin{align*}H = \frac{12} {N(N + 1)} \sum_{k = 1}^k \frac{R^2_k} {n_k} - 3(N + 1)\end{align*}
where
\begin{align*}N = \sum n_k\end{align*}
\begin{align*}n_k =\end{align*} number of observations in the \begin{align*}k^{th}\end{align*} sample
\begin{align*}R_k =\end{align*} sum of the ranks in the kth sample
Like most nonparametric tests, the Kruskal-Wallis test relies on the use of ranked data to calculate a test statistic. In this test, the measurement observations from all the samples are converted to their ranks in the overall data set. The smallest observation is assigned a rank of \begin{align*}1\end{align*}, the next smallest is assigned a rank of \begin{align*}2\end{align*}, etc. Similar to this procedure in the other test, if two observations have the same value we assign both of them the same rank.
Once the observations in all of the samples, are converted to ranks, we calculate the test statistic \begin{align*}(H)\end{align*} using the ranks and not the observations themselves. Similar to the other parametric and non-parametric tests, we use the test statistic to evaluate our hypothesis. For this test, the sampling distribution for \begin{align*}H\end{align*} is the Chi-Square distribution with \begin{align*}K - 1 \;\mathrm{Degrees}\end{align*} of Freedom where \begin{align*}K\end{align*} is the number of samples.
It is easy to use Microsoft Excel or a statistical programming package such as SAS or SPSS to calculate this test statistic and evaluate our hypothesis. However, for the purposes of this example we will perform this test by hand in the example below.
Example:
Suppose that the principal is interested in the differences among final exam scores from Mr. Red, Ms. White and Mrs. Blue’s algebra classes. The principal takes random samples of students from each of these classes and records their final exam scores:
Mr. Red | Ms. White | Mrs. Blue |
---|---|---|
\begin{align*}52\end{align*} | \begin{align*}66\end{align*} | \begin{align*}63\end{align*} |
\begin{align*}46\end{align*} | \begin{align*}49\end{align*} | \begin{align*}65\end{align*} |
\begin{align*}62\end{align*} | \begin{align*}64\end{align*} | \begin{align*}58\end{align*} |
\begin{align*}48\end{align*} | \begin{align*}53\end{align*} | \begin{align*}70\end{align*} |
\begin{align*}57\end{align*} | \begin{align*}68\end{align*} | \begin{align*}71\end{align*} |
\begin{align*}54\end{align*} | \begin{align*}73\end{align*} |
Please determine if there is a difference between the final exam scores of the three teachers.
Solution:
Our hypothesis for the Kruskal-Wallis test is that there is no difference in the distribution of the scores of these three populations. Our alternative hypothesis is that at least two of the three populations differ. For this example, we will set our level of significance at \begin{align*}\alpha=.05\end{align*}.
To test this hypothesis, we need to calculate our test statistic \begin{align*}(H)\end{align*}. To calculate this statistic, it is necessary to assign and sum the ranks for each of the scores in the table above:
Mr. Red | Overall Rank | Ms. White | Overall Rank | Mrs. Blue | Overall Rank |
---|---|---|---|---|---|
\begin{align*}52\end{align*} | \begin{align*}4\end{align*} | \begin{align*}66\end{align*} | \begin{align*}13\end{align*} | \begin{align*}63\end{align*} | \begin{align*}10\end{align*} |
\begin{align*}46\end{align*} | \begin{align*}1\end{align*} | \begin{align*}49\end{align*} | \begin{align*}3\end{align*} | \begin{align*}65\end{align*} | \begin{align*}12\end{align*} |
\begin{align*}62\end{align*} | \begin{align*}9\end{align*} | \begin{align*}64\end{align*} | \begin{align*}11\end{align*} | \begin{align*}58\end{align*} | \begin{align*}8\end{align*} |
\begin{align*}48\end{align*} | \begin{align*}2\end{align*} | \begin{align*}53\end{align*} | \begin{align*}5\end{align*} | \begin{align*}70\end{align*} | \begin{align*}15\end{align*} |
\begin{align*}57\end{align*} | \begin{align*}7\end{align*} | \begin{align*}68\end{align*} | \begin{align*}14\end{align*} | \begin{align*}71\end{align*} | \begin{align*}16\end{align*} |
\begin{align*}54\end{align*} | \begin{align*}6\end{align*} | \begin{align*}73\end{align*} | \begin{align*}17\end{align*} | ||
Rank Sum | \begin{align*}29\end{align*} | \begin{align*}46\end{align*} | \begin{align*}78\end{align*} |
Using this information, we can calculate our test statistic:
\begin{align*}H = \frac{12} {N(N + 1)} \sum_{k = 1} \frac{R^2_k} {n_k} - 3(N + 1) = \frac{12} {17 \times 18} \left (\frac{29^2} {6} + \frac{46^2} {5} + \frac{78^2} {6} \right) - 3(17 + 1) = 7.86\end{align*}
Using the Chi-Square distribution, we determined that with \begin{align*}2 \;\mathrm{Degrees}\end{align*} of Freedom (\begin{align*}3\end{align*} samples \begin{align*}– 1\end{align*}), our critical value at \begin{align*}\alpha=.05\end{align*} is \begin{align*}5.991\end{align*}. Since our test statistic \begin{align*}(H=7.86)\end{align*} exceeds the critical value, we can reject the null hypothesis that stated there is no difference in the final exam scores between students from three different classrooms.
Determining the Randomness of a Sample Using the Runs Test
The runs test (also known as the Wald-Wolfowitz test) is another nonparametric test that is used to test the hypothesis that the samples taken from a population are independent of one another. We also say that the runs test ‘checks the randomness’ of data when we are working with two variables. A run is essentially the grouping and the pattern of observations. For example, the sequence \begin{align*}“+ + + + − − − + + + − − + + + + + + − − −”\end{align*} has six ‘runs.’ Three of these runs are designated by the positive sign and three of the runs are designated by the negative sign.
We often use the run test in studies where measurements are made according to a ranking in either time or space. In these types of scenarios, one of the questions we are trying to answer is whether or not the average value of the measurement is different at different points in the sequence. For example, suppose that we are conducting a longitudinal study on the number of referrals that different teachers give throughout the year. After several months, we notice that the number of referrals appear to increase around the time that standardized tests are given. We could formally test this observation using the runs test.
Using the laws of probability, it is possible to use the to estimate the number of ‘runs’ that one would expect by chance given the proportion of the population in each of the categories and the sample size. Since we are dealing with proportions and probabilities between discrete variables, we consider the binomial distribution as the foundation of this test. When conducting a runs test, we establish the null hypothesis that the data samples are independent of one another and are random. On the contrary, our alternative hypothesis states that the data samples are not random and/or independent of one another.
The runs test can be used with either nominal or categorical data. When working with nominal data, the first step in conducting a runs test is to compute the mean of the data and then designate each observations as being either above the mean (i.e. \begin{align*}‘+’\end{align*}) or below the mean (i.e. \begin{align*}‘-’\end{align*}). Next, regardless of whether or not we are working with nominal or categorical data we compute the number of ‘runs’ within the data set. As mentioned, a run is a grouping of the variables. For example, in the following sequence we would have 5 runs \begin{align*}(R=5)\end{align*}. We could also say that the sequence of the data ‘switched’ five times.
\begin{align*}+ + - - - - + + + - +\end{align*}
After determining the number of runs, we also need to record each time a certain variable occurs and the total number of observations. In the example above, we have \begin{align*}11\end{align*} observations in total and \begin{align*}6\end{align*} ‘positives’ \begin{align*}(n_1 = 6)\end{align*} and \begin{align*}5\end{align*} ‘negatives’ \begin{align*}(n_2> = 5)\end{align*}. With this information, we are able to calculate our test statistic using the following formulas:
\begin{align*}z = \#\ \text{of observed runs} - \mu / \sigma\end{align*}
\begin{align*}\mu & = \text{expected\ number\ of\ runs} = 1+ \frac{2n_1n_2} {n_1 + n_2} \\ \sigma^2 & =\text{variance\ number\ of\ runs} = \frac{2n_1n_2(2n_1n_2 - n_1 - n_2)} {(n_1 + n_2)^2(n_1 + n_2 - 1)}\end{align*}
When conducting the runs test, we calculate the standard \begin{align*}z\end{align*}-score and evaluate our hypotheses just like we do with other parametric and non-parametric tests.
Example:
A teacher is interested in assessing if the seating arrangement of males and females in his classroom are random. He records the seating pattern of his students and records the following sequence:
\begin{align*}\text{MFMMFFFFMMMFMFMMMMFFMFFMFFFF}\end{align*}
Is the seating arrangement random? Use a \begin{align*}\alpha=.05\end{align*}.
Solution:
To answer this question, we first generate the null hypothesis that the seating arrangement is random and independent. Our alternate hypothesis states that the seating arrangement is not random or independent. With a \begin{align*}\alpha=.05\end{align*}, we set our critical values at \begin{align*}1.96\end{align*} standard scores above and below the mean.
To calculate the test statistic, we first record the number of runs and the number of each type of observation:
\begin{align*}R = 14\end{align*}
- \begin{align*}M \Box(n \Box_\downarrow 1) = 13\end{align*}
- \begin{align*}F \Box(n \Box_\downarrow 2) = 15\end{align*}
With these data, we can easily compute the test statistic:
\begin{align*}\mu & = \text{expected number of runs} = 1 + \frac{2(13)(15)} {13 + 15} = 1 + \frac{390} {28} = 14.9 \\ \sigma^2 & = \text{variance number of runs} = \frac{2(13)(15)(2 * 13 * 15 - 13 - 15)} {(13 * 15)^2(13 + 15 - 1)} = \frac{390(362)} {(152100) (27)} = .0034 \\ \sigma & = 0.05 \\ z & = \#\ \text{of observed runs} - \mu / \sigma = \frac{14 - 14.9} {.05} = -18.0\end{align*}
Since the calculated test statistic is extremely high \begin{align*}(z = 18.0)\end{align*} and exceeds our critical value we can reject the null hypothesis and conclude that the seating arrangement of males and females is not random.
Lesson Summary
1. The Kruskal-Wallis test is used when we are assessing the one way variance of a specific variable in non-normal distributions.
2. The test statistic for the Kruskal-Wallis test \begin{align*}(H)\end{align*} is the non-parametric alternative to the \begin{align*}F\end{align*}-statistic. This test statistic is defined by the formula
\begin{align*}H = \frac{12} {N(N + 1)} \sum_{k = 1}^k \frac{R^2_k} {n_k} - 3(N + 1)\end{align*}
3. The runs test (also known as the Wald-Wolfowitz test) is another non-parametric test that is used to test the hypothesis that the samples taken from a population are independent of one another. We use the \begin{align*}z\end{align*}-statistic to evaluate this hypothesis.