12.2: The Rank Sum Test and Rank Correlation
Learning Objectives
 Understand the conditions for use of the rank sum test to evaluate a hypothesis about nonpaired data.
 Calculate the mean and the standard deviation of rank from two nonpaired samples and use these values to calculate a zscore.
 Determine the correlation between two variables using the rank correlation test for situations that meet the appropriate criteria using the appropriate test statistic formula.
Introduction
In the previous lesson, we explored the concept of nonparametric tests. As review, we use nonparametric tests when analyzing data that are not normally distributed or homogeneous with respect to variance. While parametric tests are preferred since they have more ‘power,’ they are not always applicable in statistical research.
In the last section we explored two tests  the sign test and the sign rank test. We use these tests when analyzing matched data pairs or categorical data samples. In both of these tests, our null hypothesis states that there is no difference between the distributions of these variables. As mentioned, the sign rank test is a more precise test of this question, but the test statistic can be more difficult to calculate.
But what happens if we want to test if two samples come from the same nonnormal distribution? For this type of question, we use the rank sum test (also known as the MannWhitney
In this section we will learn how to conduct hypothesis tests using the MannWhitney
Conditions for Use of the RankSum Test to Evaluate Hypotheses about NonPaired Data
As mentioned, the rank sum test tests the hypothesis that two independent samples are drawn from the same population. As a reminder, we use this test when we are not sure if the assumptions of normality or homogeneity of variance are met. Essentially, this test compares the medians and the distributions of the two independent samples. This test is considered stronger than other nonparametric tests that simply assess median values. For example, in the image below we see that the two samples have the same median, but very different distributions. If we were assessing just the median value, we would not realize that these samples actually have very different distributions.
When performing the rank sum test, there are several different conditions that need to be met. These include:
 Although the population need not be normally distributed or have homogeneity of variance, the observations must be continuously distributed.
 That the samples drawn from the population are independent of one another.
 That the samples have
5 or more observations. The samples do not need to have the same number of observations.  The observations must be on a numeric or ordinal scale. They cannot be categorical variables.
Since the rank sum test evaluates both the median and the distribution of two independent samples, we establish two null hypotheses. Our null hypotheses state that the two medians and the distributions of the independent samples are equal. Symbolically, we could say that
Calculating the Mean and the Standard Deviation of Rank to Calculate a ZScore
When performing the rank sum test, we need to calculate a figure known as the
To calculate the
where:
We use the smaller of the two calculated test statistics (i.e. – the lesser of
When working with larger samples, we need to calculate two additional pieces of information: the mean of the sampling distribution
and
Finally, we use the general formula for the test statistic to test our null hypothesis:
Example:
Say that we are interested in determining the attitudes on the current status of the economy from women that work outside the home and from women that do not work outside the home. We take a sample of
Women Working Outside the Home  Women Working Outside the Home 

Score  Rank 










































Women Not Working Outside the Home  Women Not Working Outside the Home 

Score  Rank 










































Do these two groups of women have significantly different views on the issue?
Solution:
Since each of our samples has
Since we use the smaller of the two \begin{align*}U\end{align*} statistics, we set \begin{align*}U = 198\end{align*}. When calculating the other two figures, we find:
\begin{align*}\mu_U = \frac{n_1n_2} {2} = \frac{20 * 20} {2} = 200\end{align*}
and
\begin{align*}\sigma_u=\sqrt{\frac{(n_1)(n_2)(n_1 + n_2 + 1)} {12}} = \sqrt{\frac{(20)(20)(20 + 20 + 1)} {12}} = \sqrt{\frac{(400)(41)} {12}} = 36.97\end{align*}
When calculating the \begin{align*}z\end{align*}statistic we find,
\begin{align*}z = \frac{U  \mu_U} {\sigma_U} = \frac{198  200} {36.97} = 0.05\end{align*}
If we set the \begin{align*}\alpha=.05\end{align*}, we would find that the calculated test statistic does not exceed the critical value of \begin{align*}1.96\end{align*}. Therefore, we fail to reject the null hypothesis and conclude that these two samples come from the same population.
We can use this \begin{align*}z\end{align*}score to evaluate our hypotheses just like we would with any other hypothesis test. When interpreting the results from the rank sum test it is important to remember that we are really asking whether or not the populations have the same median and variance. In addition, we are assessing the chance that random sampling would result in medians and variables as far apart (or as close together) as observed in the test. If the \begin{align*}z\end{align*}score is large (meaning that we would have a small \begin{align*}P\end{align*}value) we can reject the idea that the difference is a coincidence. If the \begin{align*}z\end{align*}score is small like in the example above (meaning that we would have a large \begin{align*}P\end{align*}value), we do not have any reason to conclude that the medians of the populations differ and that the samples likely came from the same population.
Determining the Correlation between Two Variables Using the Rank Correlation Test
As we learned in Chapter 9, it is possible to determine the correlation between two variables by calculating the Pearson productmoment correlation coefficient (more commonly known as the linear correlation coefficient or \begin{align*}r\end{align*}). The correlation coefficient helps us determine the strength, magnitude and direction of the relationship between two variables with normal distributions.
We also use the Spearman rank correlation (also known as simply the ‘rank correlation’ coefficient, \begin{align*}\rho\end{align*} or ‘rho’) coefficient to measure the strength, magnitude and direction of the relationship between two variables. The test statistic from this test (\begin{align*}\rho\end{align*} or ‘rho’) is the nonparametric alternative to the correlation coefficient and we use this test when the data do not meet the assumptions about normality. We also use the Spearman rank correlation test when one or both of the variables consist of ranks. The Spearman rank correlation coefficient is defined by the formula:
\begin{align*}\rho = 1  \frac{6 \textstyle\sum d^2} {n(n^2  1)}\end{align*}
where \begin{align*}d\end{align*} is the difference in statistical rank of corresponding observations.
The test works by converting each of the observations to ranks, just like we learned about with the rank sum test. Therefore, if we were doing a rank correlation of scores on a final exam versus SAT scores, the lowest final exam score would get a rank of \begin{align*}1\end{align*}, the second lowest a rank of \begin{align*}2\end{align*}, etc. The lowest SAT score would get a rank of \begin{align*}1\end{align*}, the second lowest a rank of \begin{align*}2\end{align*}, etc. Similar to the rank sum test, if two observations are equal the average rank is used for both of the observations.
Once the observations are converted to ranks, a correlation analysis is performed on the ranks (note: this analysis is not performed on the observations themselves). The Spearman correlation coefficient is calculated from the columns of ranks. However, because the distributions are nonnormal, a regression line is rarely used and we do not calculate a nonparametric equivalent of the regression line. It is easy to use a statistical programming package such as SAS or SPSS to calculate the Spearman rank correlation coefficient. However, for the purposes of this example we will perform this test by hand as shown in the example below.
Example:
The head of the math department is interested in the correlation between scores on a final math exam and the math SAT score. She took a random sample of \begin{align*}15\end{align*} students and recorded each students’ final exam and math SAT scores. Since SAT scores are designed to be normally distributed, the Spearman rank correlation may be an especially effective tool for this comparison. Use the Spearman rank correlation test to determine the correlation coefficient. The data for this example are recorded below:
Math SAT Score  Final Exam Score 

\begin{align*}595\end{align*}  \begin{align*}68\end{align*} 
\begin{align*}520\end{align*}  \begin{align*}55\end{align*} 
\begin{align*}715\end{align*}  \begin{align*}65\end{align*} 
\begin{align*}405\end{align*}  \begin{align*}42\end{align*} 
\begin{align*}680\end{align*}  \begin{align*}64\end{align*} 
\begin{align*}490\end{align*}  \begin{align*}45\end{align*} 
\begin{align*}565\end{align*}  \begin{align*}56\end{align*} 
\begin{align*}580\end{align*}  \begin{align*}59\end{align*} 
\begin{align*}615\end{align*}  \begin{align*}56\end{align*} 
\begin{align*}435\end{align*}  \begin{align*}42\end{align*} 
\begin{align*}440\end{align*}  \begin{align*}38\end{align*} 
\begin{align*}515\end{align*}  \begin{align*}50\end{align*} 
\begin{align*}380\end{align*}  \begin{align*}37\end{align*} 
\begin{align*}510\end{align*}  \begin{align*}42\end{align*} 
\begin{align*}565\end{align*}  \begin{align*}53\end{align*} 
Solution:
To calculate the Spearman rank correlation coefficient, we determine the ranks of each of the variables in the data set (above), calculate the difference and then calculate the squared difference for each of these ranks.
Math SAT Score \begin{align*}(X)\end{align*}  Final Exam Score \begin{align*}(Y)\end{align*}  X Rank  Y Rank  \begin{align*}d\end{align*}  \begin{align*}d^2\end{align*} 

\begin{align*}595\end{align*}  \begin{align*}68\end{align*}  \begin{align*}4\end{align*}  \begin{align*}1\end{align*}  \begin{align*}3\end{align*}  \begin{align*}9\end{align*} 
\begin{align*}520\end{align*}  \begin{align*}55\end{align*}  \begin{align*}8\end{align*}  \begin{align*}7\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}715\end{align*}  \begin{align*}65\end{align*}  \begin{align*}1\end{align*}  \begin{align*}2\end{align*}  \begin{align*}\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}405\end{align*}  \begin{align*}42\end{align*}  \begin{align*}14\end{align*}  \begin{align*}12\end{align*}  \begin{align*}2\end{align*}  \begin{align*}4\end{align*} 
\begin{align*}680\end{align*}  \begin{align*}64\end{align*}  \begin{align*}2\end{align*}  \begin{align*}3\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}490\end{align*}  \begin{align*}45\end{align*}  \begin{align*}11\end{align*}  \begin{align*}10\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}565\end{align*}  \begin{align*}56\end{align*}  \begin{align*}6.5\end{align*}  \begin{align*}5.5\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}580\end{align*}  \begin{align*}59\end{align*}  \begin{align*}5\end{align*}  \begin{align*}4\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}615\end{align*}  \begin{align*}56\end{align*}  \begin{align*}3\end{align*}  \begin{align*}5.5\end{align*}  \begin{align*}2.5\end{align*}  \begin{align*}6.25\end{align*} 
\begin{align*}435\end{align*}  \begin{align*}42\end{align*}  \begin{align*}13\end{align*}  \begin{align*}12\end{align*}  \begin{align*}1\end{align*}  \begin{align*}1\end{align*} 
\begin{align*}440\end{align*}  \begin{align*}38\end{align*}  \begin{align*}12\end{align*}  \begin{align*}14\end{align*}  \begin{align*}2\end{align*}  \begin{align*}4\end{align*} 
\begin{align*}515\end{align*}  \begin{align*}50\end{align*}  \begin{align*}9\end{align*}  \begin{align*}9\end{align*}  \begin{align*}0\end{align*}  \begin{align*}0\end{align*} 
\begin{align*}380\end{align*}  \begin{align*}37\end{align*}  \begin{align*}15\end{align*}  \begin{align*}15\end{align*}  \begin{align*}0\end{align*}  \begin{align*}0\end{align*} 
\begin{align*}510\end{align*}  \begin{align*}42\end{align*}  \begin{align*}10\end{align*}  \begin{align*}12\end{align*}  \begin{align*}2\end{align*}  \begin{align*}4\end{align*} 
\begin{align*}565\end{align*}  \begin{align*}53\end{align*}  \begin{align*}6.5\end{align*}  \begin{align*}8\end{align*}  \begin{align*}1.5\end{align*}  \begin{align*}2.25\end{align*} 
Sum  \begin{align*}0\end{align*}  \begin{align*}36.50\end{align*} 
Using the formula for the Spearman correlation coefficient, we find that:
\begin{align*}\rho = 1  6 \sum \frac{d^2} {n(n^2  1)} = 1  \frac{6(36.50)} {15(225  1)} = 1  0.07 = 0.93\end{align*}
We interpret this rank correlation coefficient in the same way as we interpret the linear correlation coefficient. This coefficient states that there is a strong, positive correlation between the two variables.
Lesson Summary
1. We use the rank sum test (also known as the MannWhitney \begin{align*}\upsilon\end{align*} test) to assess whether two samples come from the same distribution. This test is sensitive to both the median and the distribution of the samples.
2. When performing the rank sum test there are several different conditions that need to be met including that the population not be normally distributed, we have continuously distributed observations, there be an independence of samples, the samples are greater than \begin{align*}5\end{align*} observations, and that the observations be on a numeric or ordinal scale.
3. When performing the rank sum test, we need to calculate a figure known as the \begin{align*}U\end{align*} statistic. This statistic takes both the median and the total distribution of both samples into account.
4. To calculate the test statistic for the rank sum test, we first must calculate something known as the \begin{align*}U\end{align*} statistic which is derived from the ranks of the observations in both samples. When performing our hypotheses tests, we calculate the standard score which is defined as
\begin{align*}z = \frac{U  \mu_U} {\sigma_U}\end{align*}
5. We use the Spearman rank correlation coefficient (also known as simply the ‘rank correlation’ coefficient) to measure the strength, magnitude and direction of the relationship between two variables from nonnormal distributions.
\begin{align*}\rho = 1  \frac{6 \textstyle\sum d^2} {n(n^2  1)}\end{align*}
Notes/Highlights Having trouble? Report an issue.
Color  Highlighted Text  Notes  

Please Sign In to create your own Highlights / Notes  
Show More 