### Rank Sum Test and Rank Correlation

We have explored the concept of nonparametric tests. We explored two tests-the sign test and the sign rank test. We use these tests when analyzing matched data pairs or categorical data samples. In both of these tests, our null hypothesis states that there is no difference between the medians of these variables. As mentioned, the sign rank test is a more precise test of this question, but the test statistic can be more difficult to calculate.

But what happens if we want to test if two samples come from the same non-normal distribution? For this type of question, we use the rank sum test (also known as the **Mann-Whitney \begin{align*}v\end{align*} v-test**). This test is sensitive to both the median and the distribution of the sample and population.

In this Concept, we will learn how to conduct hypothesis tests using the Mann-Whitney \begin{align*}v\end{align*}

#### Conditions **for Use of the Rank Sum Test to Evaluate Hypotheses about Non-Paired Data**

The **rank sum test** tests the hypothesis that two independent samples are drawn from the same population. Recall that we use this test when we are not sure if the assumptions of normality or homogeneity of variance are met. Essentially, this test compares the medians and the distributions of the two independent samples. This test is considered stronger than other nonparametric tests that simply assess median values.

#### Performing the Rank Sum Test

In the image below, we see that the two samples have the same median, but very different distributions. If we were assessing just the median value, we would not realize that these samples actually have distributions that are very distinct.

When performing the rank sum test, there are several different conditions that need to be met. These include the following:

- Although the populations need not be normally distributed or have homogeneity of variance, the observations must be continuously distributed.
- The samples drawn from the population must be independent of one another.
- The samples must have 5 or more observations. The samples do not need to have the same number of observations.
- The observations must be on a numeric or ordinal scale. They cannot be categorical variables.

Since the rank sum test evaluates both the medians and the distributions of two independent samples, we establish two null hypotheses. Our null hypotheses state that the two medians and the two standard deviations of the independent samples are equal. Symbolically, we could say \begin{align*}H_0 : m_1=m_2\end{align*}

**Calculating the Mean and the Standard Deviation of Rank to Calculate** a **\begin{align*}z\end{align*}**z -Score

When performing the rank sum test, we need to calculate a figure known as the *\begin{align*}U\end{align*} U-statistic*. This statistic takes both the median and the total distribution of the two samples into account. The \begin{align*}U\end{align*}

*\begin{align*}U\end{align*}*U -distributionapproaches the normal distribution as the sizes of both samples grow. When we have samples of 20 or more, we do not use the \begin{align*}U\end{align*}

To calculate the \begin{align*}U\end{align*}

\begin{align*}U_1 & = n_1n_2 + \frac{n_1(n_1+1)}{2} - R_1\\
U_2 & = n_1n_2 + \frac{n_2(n_2+1)}{2} - R_2\end{align*}

where:

\begin{align*}n_1\end{align*}

\begin{align*}n_2\end{align*}

\begin{align*}R_1\end{align*}

\begin{align*}R_2\end{align*}

We use the smaller of the two calculated test statistics (i.e., the lesser of \begin{align*}U_1\end{align*}

When working with larger samples, we need to calculate two additional pieces of information: the mean of the sampling distribution, \begin{align*}\mu_U\end{align*}

\begin{align*}\mu_U = \frac{n_1n_2}{2} \ \text{and} \ \sigma_U = \sqrt{\frac{n_1(n_2)(n_1+n_2+1)}{12}}\end{align*}

Finally, we use the general formula for the test statistic to test our null hypothesis:

\begin{align*}z=\frac{U-\mu_U}{\sigma_U}\end{align*}

#### Using a Z-Score to Evaluate Hypotheses

Suppose we are interested in determining the attitudes on the current status of the economy from women who work outside the home and from women who do not work outside the home. We take a sample of 20 women who work outside the home (sample 1) and a sample of 20 women who do not work outside the home (sample 2) and administer a questionnaire that measures their attitudes about the economy. These data are found in the tables below:

Women Working Outside the Home |
Women Working Outside the Home |
---|---|

Score | Rank |

9 | 1 |

12 | 3 |

13 | 4 |

19 | 8 |

21 | 9 |

27 | 13 |

31 | 16 |

33 | 17 |

34 | 18 |

35 | 19 |

39 | 21 |

40 | 22 |

44 | 25 |

46 | 26 |

49 | 29 |

58 | 33 |

61 | 34 |

63 | 35 |

64 | 36 |

70 | 39 |

\begin{align*}R_1=408\end{align*} |

Women Not Working Outside the Home |
Women Not Working Outside the Home |
---|---|

Score | Rank |

10 | 2 |

15 | 5 |

17 | 6 |

18 | 7 |

23 | 10 |

24 | 11 |

25 | 12 |

28 | 14 |

30 | 15 |

37 | 20 |

41 | 23 |

42 | 24 |

47 | 27 |

48 | 28 |

52 | 30 |

55 | 31 |

56 | 32 |

65 | 37 |

69 | 38 |

71 | 40 |

\begin{align*}R_2=412\end{align*} |

Do these two groups of women have significantly different views on the issue?

Since each of our samples has 20 observations, we need to calculate the standard \begin{align*}z\end{align*}

\begin{align*}U_1 &= n_1n_2+ \frac{n_1(n_1+1)}{2}-R_1=(20)(20)+\frac{(20)(20+1)}{2}-408=202\\
U_2 &= n_1n_2+ \frac{n_2(n_2+1)}{2}-R_2=(20)(20)+\frac{(20)(20+1)}{2}-412=198\end{align*}

Since we use the smaller of the two \begin{align*}U\end{align*}

\begin{align*}\mu_U=\frac{n_1n_2}{2}=\frac{(20)(20)}{2}=200\end{align*}

and

\begin{align*}\sigma_U=\sqrt{\frac{n_1(n_2)(n_1+n_2+1)}{12}}=\sqrt{\frac{(20)(20)(20+20+1)}{12}}=\sqrt{\frac{(400)(41)}{12}}=36.97\end{align*}

Thus, we calculate the \begin{align*}z\end{align*}-statistic as shown below:

\begin{align*}z=\frac{U-\mu_U}{\sigma_U}=\frac{198-200}{36.97}=-0.05\end{align*}

If we set \begin{align*}\alpha=0.05\end{align*}, we would find that the calculated test statistic does not exceed the critical value of \begin{align*}-1.96\end{align*}. Therefore, we fail to reject the null hypothesis and conclude that these two samples come from the same population.

We can use this \begin{align*}z\end{align*}-score to evaluate our hypotheses just like we would with any other hypothesis test. When interpreting the results from the rank sum test, it is important to remember that we are really asking whether or not the populations have the same median and variance. In addition, we are assessing the chance that random sampling would result in medians and variances as far apart (or as close together) as observed in the test. If the \begin{align*}z\end{align*}-score is large (meaning that we would have a small \begin{align*}P\end{align*}-value), we can reject the idea that the difference is a coincidence. If the \begin{align*}z\end{align*}-score is small, like in the example above (meaning that we would have a large \begin{align*}P\end{align*}-value), we do not have any reason to conclude that the medians of the populations differ and, therefore, conclude that the samples likely came from the same population.

**Determining the Correlation between Two Variables Using the Rank Correlation Test**

It is possible to determine the correlation between two variables by calculating the Pearson product-moment correlation coefficient (more commonly known as the linear correlation coefficient, or \begin{align*}r\end{align*}). The correlation coefficient helps us determine the strength, magnitude, and direction of the relationship between two variables with normal distributions.

We also use the **Spearman rank correlation coefficien***t* (also known simply as the **rank correlation coefficient**, \begin{align*}\rho\end{align*}, or ‘rho’) to measure the strength, magnitude, and direction of the relationship between two variables. This test statistic is the nonparametric alternative to the correlation coefficient, and we use it when the data do not meet the assumptions of normality. The Spearman rank correlation coefficient, used as part of the *rank correlation test*, can also be used when one or both of the variables consist of ranks. The Spearman rank correlation coefficient is defined by the following formula:

\begin{align*}\rho=1-\frac{6 \sum d^2}{n(n^2-1)}\end{align*}

where \begin{align*}d\end{align*} is the difference in statistical rank of corresponding observations.

The test works by converting each of the observations to ranks, just like we learned about with the rank sum test. Therefore, if we were doing a rank correlation of scores on a final exam versus SAT scores, the lowest final exam score would get a rank of 1, the second lowest a rank of 2, and so on. Likewise, the lowest SAT score would get a rank of 1, the second lowest a rank of 2, and so on. Similar to the rank sum test, if two observations are equal, the average rank is used for both of the observations. Once the observations are converted to ranks, a correlation analysis is performed on the ranks. (Note: This analysis is not performed on the observations themselves.) The Spearman correlation coefficient is then calculated from the columns of ranks. However, because the distributions are non-normal, a regression line is rarely used, and we do not calculate a non-parametric equivalent of the regression line. It is easy to use a statistical programming package, such as SAS or SPSS, to calculate the Spearman rank correlation coefficient. However, for the purposes of this example, we will perform this test by hand as shown in the example below.

#### Calculating a Correlation Coefficient using the Spearman Rank Correlation Test

The head of a math department is interested in the correlation between scores on a final math exam and math SAT scores. She took a random sample of 15 students and recorded each student's final exam score and math SAT score. Since SAT scores are designed to be normally distributed, the Spearman rank correlation test may be an especially effective tool for this comparison. Use the Spearman rank correlation test to determine the correlation coefficient. The data for this example are recorded below:

Math SAT Score |
Final Exam Score |
---|---|

595 | 68 |

520 | 55 |

715 | 65 |

405 | 42 |

680 | 64 |

490 | 45 |

565 | 56 |

580 | 59 |

615 | 56 |

435 | 42 |

440 | 38 |

515 | 50 |

380 | 37 |

510 | 42 |

565 | 53 |

To calculate the Spearman rank correlation coefficient, we determine the ranks of each of the variables in the data set, calculate the difference for each of these ranks, and then calculate the squared difference.

Math SAT Score (\begin{align*}X\end{align*}) |
Final Exam Score (\begin{align*}Y\end{align*}) |
\begin{align*}X\end{align*} Rank |
\begin{align*}Y\end{align*} Rank |
\begin{align*}d\end{align*} | \begin{align*}d^2\end{align*} |
---|---|---|---|---|---|

595 | 68 | 4 | 1 | 3 | 9 |

520 | 55 | 8 | 7 | 1 | 1 |

715 | 65 | 1 | 2 | \begin{align*}-1\end{align*} | 1 |

405 | 42 | 14 | 12 | 2 | 4 |

680 | 64 | 2 | 3 | \begin{align*}-1\end{align*} | 1 |

490 | 45 | 11 | 10 | 1 | 1 |

565 | 56 | 6.5 | 5.5 | 1 | 1 |

580 | 59 | 5 | 4 | 1 | 1 |

615 | 56 | 3 | 5.5 | \begin{align*}-2.5\end{align*} | 6.25 |

435 | 42 | 13 | 12 | 1 | 1 |

440 | 38 | 12 | 14 | \begin{align*}-2\end{align*} | 4 |

515 | 50 | 9 | 9 | 0 | 0 |

380 | 37 | 15 | 15 | 0 | 0 |

510 | 42 | 10 | 12 | \begin{align*}-2\end{align*} | 4 |

565 | 53 | 6.5 | 8 | \begin{align*}-1.5\end{align*} | 2.25 |

Sum | 0 | 36.50 |

Using the formula for the Spearman correlation coefficient, we find the following:

\begin{align*}\rho=1-\frac{6 \sum d^2}{n(n^2-1)}=1-\frac{(6)(36.50)}{(15)(225-1)}=0.9348\end{align*}

We interpret this rank correlation coefficient in the same way as we interpret the linear correlation coefficient. This coefficient states that there is a strong, positive correlation between the two variables.

### Example

#### Example 1

A sample of 13 children was obtained, 5 girls and 8 boys, and asked to place a set of block in a specific pattern. The time, in seconds, required by each child to arrange the blocks was recorded. Use the rank sum test to determine if there is a difference in dexterity between the boys and the girls.

Girls |
Boys |
---|---|

25 | 39 |

20 | 58 |

31 | 41 |

44 | 36 |

23 | 28 |

106 | |

50 | |

27 |

First we make a table of the ranks:

Girls Data |
Girls Rank |
Boys Data |
Boys Rank |
---|---|---|---|

25 | 3 | 39 | 8 |

20 | 1 | 58 | 12 |

31 | 6 | 41 | 9 |

44 | 10 | 36 | 7 |

23 | 2 | 28 | 5 |

106 | 13 | ||

50 | 11 | ||

27 | 4 |

Now we do the calculations:

\begin{align*}n_1=5, \Sigma R_1=22, n_2=8, \Sigma R_2 = 69\end{align*}

\begin{align*} U_1=n_1n_2+\frac{n_1(n_1+1)}{2}-R_1=33\end{align*}

\begin{align*} U_2=n_1n_2+\frac{n_2(n_2+1)}{2}-R_2=7\end{align*}

\begin{align*}U=min(U_1, U_2)=7\end{align*}

\begin{align*}\mu_{U}=\frac{n_1 n_2}{n}=20, \sigma_{U}=\sqrt{\frac{n_1 n_2(n_1+n_2+1)}{12}}=4.83\end{align*}

\begin{align*}z=\frac{U-\mu_{U}}{\sigma_{U}}=\frac{7-20}{4.83}=-2.69\end{align*}

Since this is a two-sided test the p-value is \begin{align*}2P(z<-2.69)=2(0.004)=0.008\end{align*}

This is less than .05 so we reject the null hypothesis and believe there is a difference in dexterity between girls and boys.

### Review

- When do you use the rank sum test?
- Suppose the grades on an exam for the male and female students in a class were as indicated below. Use the Wilcoxon rank sum test at the 5% level of significance to test whether males and females did equally well.

Males |
Females |
---|---|

99 | 99 |

96 | 94 |

88 | 91 |

88 | 90 |

85 | 88 |

83 | 87 |

79 | 79 |

78 | 78 |

78 | 72 |

71 | 58 |

65 | 51 |

58 | 43 |

57 | 41 |

53 | 31 |

49 | 15 |

39 | |

34 | |

23 | |

22 | |

5 |

- Two students compared two brands of chips, Doritos and Frito Lays, to see which company gives you more for your money. According to the label on each of the bags, each bag contained 35.4 grams. The students looked at 5 bags of each brand. For Doritos, they found the bags contained 37.3 grams, 37.4 grams, 37.8 grams, 37.9 grams, and 35.9 grams. For Frito Lays, they found the bags contained 35.3 grams, 37.8 grams, 38.8 grams, 35.9 grams, and 35.9 grams. Use the Wilcoxon rank sum test to see if there is a significant difference between the amount each brand puts in their bags.

For 4-6, a researcher is interested in knowing if there is a difference between staff, trainees and students in their ability to interpret a particular test that was designed to identify a certain form of mental illness. The test was given to 100 people, half of whom had the mental illness. 15 judges, 5 staff, 5 trainees and 5 students interpreted the test. The table five the number of tests correctly interpreted by each of the 15 judges.

Staff |
Trainees |
Students |
---|---|---|

78 | 80 | 65 |

76 | 69 | 74 |

80 | 75 | 80 |

81 | 83 | 82 |

88 | 74 | 77 |

- What are the ranks for the observations in the first row?
- What is the highest rank given to any observation?
- If the p-value of the test is small, what is the conclusion?

- What test statistic is used by the Wilcoxon rank sum test?
- How is that test statistic obtained?
- What null hypothesis does the Wilcoxon rank sum test? What are the possible alternative hypotheses?
- What assumptions are made by the Wilcoxon rank sum test?
- Qualitatively, what should happen to the rank sum for sample A if distribution A shifted to the right of distribution B? If it is shifted to the left?

### Review (Answers)

To view the Review answers, open this PDF file and look for section 12.2.