# 12.1: Introduction to Non-Parametric Statistics

**At Grade**Created by: CK-12

## Learning Objectives

- Understand situations in which non-parametric analytical methods should be used and the advantages and disadvantages of each of these methods.
- Understand situations in which the sign test can be used and calculate z-scores for evaluating a hypothesis using matched pair data sets.
- Use the sign test to evaluate a hypothesis about a median of a population.
- Examine a categorical data set to evaluate a hypothesis using the sign test.
- Understand the signed-ranks test as a more precise alternative to the sign test when evaluating a hypothesis.

## Introduction

In previous lessons, we discussed the use of the normal distribution, the Student’s t-distribution and the F-distribution in testing various hypotheses. With each of these distributions, we made certain assumptions about the populations from which our samples were drawn. Specifically, we made assumptions that the populations were normally distributed and that there was homogeneity of variance within the population. But what do we do when we have data that are not normally distributed or not homogeneous with respect to variance? In these situations we use something called **non-parametric tests**.

As mentioned, non-parametric tests are used when the assumptions of normality and homogeneity of variance are not met. These tests include tests such as the sign test, the sign-ranks test, the ranks-sum test, the Kruskal-Wallis test and the runs test. While parametric tests are preferred since they have more ‘power,’ they are not always applicable in statistical research. The following sections will examine situations in which we would use non-parametric methods and the advantages and disadvantages to using these methods.

## Situations Where We Use Non-Parametric Tests

If non-parametric tests have fewer assumptions and can be used with a broader range of data types, why don’t we use them all the time? There are several *advantages of* using parametric tests (i.e., the *t*-test for independent samples, the correlation coefficient and the one way analysis of variance) including the fact that they are more robust and have greater **power**. Having more **power** means that they have a greater chance of rejecting the null hypothesis relative to the sample size.

However, one *disadvantage* of parametric tests is that they demand that the data meet stringent requirements such as normality and homogeneity. For example, a one-sample \begin{align*}t\end{align*}

As mentioned, an *advantage* of non-parametric tests is that they do not require the data to be normally distributed. In addition, although they test the same concepts, non-parametric tests sometimes have fewer calculations than their parametric counterparts. Non-parametric tests are often used to test different types of questions and allow us to perform analysis with categorical and rank data. The table below lists the parametric test, its non-parametric counterpart and the purpose of the test.

Commonly Used Parametric and Non-parametric Tests

Parametric Test (Normal Distributions) | Non-parametric Test (Non-normal Distributions) | Purpose of Test |
---|---|---|

\begin{align*}t\end{align*} |
Rank sum test | Compares means of two independent samples |

Paired \begin{align*}t\end{align*} |
Sign test | Examines a set of differences of means |

Pearson correlation coefficient | Rank correlation test | Assesses the linear association between two variables. |

One way analysis of variance (\begin{align*}F\end{align*} |
Kruskal-Wallis test | Compares three or more groups |

Two way analysis of variance | Runs test | Compares groups classified by two different factors |

## The Sign Test

One of the simplest non-parametric tests is the **sign test.** Technically, the sign test examines the difference in the medians of matched data sets. It is important to note that we use the sign test *only* when testing if there is a difference between the matched pairs of observations. This does not measure the magnitude of the relationship - it simply tests whether the differences between the observations in the matched pairs are equally likely to be positive or negative. Many times, this test is used in place of a paired \begin{align*}t\end{align*}

For example, we would use the sign test when assessing if a certain drug or treatment had an impact on a population or if a certain program made a difference in behavior. In this example, we would match the two sets of data (pre-test and post-test), measure and record each of the observations and examine the differences between the two. Depending on the size of the sample, we would calculate either the \begin{align*}z\end{align*}

With the sign test, we first must determine whether there is a positive or negative difference between each of the matched pairs. To determine this, we arrange the data in such a way that it is easy to identify what type of difference that we have. Let’s take a look at an example to help clarify this concept. Say that we have a school psychologist who is interested in whether or not a behavior intervention program is working. He examines \begin{align*}8 \;\mathrm{middle}\end{align*}

Observation Number | Referrals Before Program | Referrals After Program |
---|---|---|

1 | \begin{align*}8\end{align*} | \begin{align*}5\end{align*} |

2 | \begin{align*}10\end{align*} | \begin{align*}8\end{align*} |

3 | \begin{align*}2\end{align*} | \begin{align*}3\end{align*} |

4 | \begin{align*}4\end{align*} | \begin{align*}1\end{align*} |

5 | \begin{align*}6\end{align*} | \begin{align*}4\end{align*} |

6 | \begin{align*}4\end{align*} | \begin{align*}1\end{align*} |

7 | \begin{align*}5\end{align*} | \begin{align*}7\end{align*} |

8 | \begin{align*}9\end{align*} | \begin{align*}6\end{align*} |

Since we need to determine the number of observations where there is a positive difference and the number of observations where there is a negative difference, it is helpful to add an additional column to the table to classify each observation as such (see below). We ignore all zero or equal observations.

Observation Number | Referrals Before Program | Referrals After Program | Change |
---|---|---|---|

1 | \begin{align*}8\end{align*} | \begin{align*}5\end{align*} | \begin{align*}-\end{align*} |

2 | \begin{align*}10\end{align*} | \begin{align*}8\end{align*} | \begin{align*}-\end{align*} |

3 | \begin{align*}2\end{align*} | \begin{align*}3\end{align*} | \begin{align*}+\end{align*} |

4 | \begin{align*}4\end{align*} | \begin{align*}1\end{align*} | \begin{align*}-\end{align*} |

5 | \begin{align*}6\end{align*} | \begin{align*}4\end{align*} | \begin{align*}-\end{align*} |

6 | \begin{align*}4\end{align*} | \begin{align*}1\end{align*} | \begin{align*}-\end{align*} |

7 | \begin{align*}5\end{align*} | \begin{align*}7\end{align*} | \begin{align*}+\end{align*} |

8 | \begin{align*}9\end{align*} | \begin{align*}6\end{align*} | \begin{align*}-\end{align*} |

When performing the sign test, we use the \begin{align*}t\end{align*}-distribution if the sample has less than \begin{align*}30\end{align*} observations and we use the normal distribution if the sample has greater than \begin{align*}30\end{align*} observations. Regardless of the distribution that we use, the formula for calculating the test statistic (either the \begin{align*}t\end{align*}- or \begin{align*}z\end{align*}-score) is the same.

\begin{align*}t = \frac{|\#\ \text{Positive\ Observations} - \#\ \text{Negative\ Observations}| - 1}{\sqrt{n}}\end{align*}

This formula states that the standard score (the \begin{align*}z\end{align*} or the \begin{align*}t\end{align*}) is equal to the absolute value of the difference between positive differences within matched pairs and the negative differences within matched pairs minus one and divided by the square root of the number of observations. For our example above, we would have a calculated \begin{align*}t\end{align*}-score of:

\begin{align*}t = \frac{|2 - 6| - 1} {\sqrt{8}} \approx 1.06\end{align*}

Similar to other hypothesis tests using standard scores, we establish null and alternative hypotheses about the population and use the test statistic to assess these hypotheses. As mentioned, this test is used with paired data and examines whether the median of the two data sets are equal. When we conduct a pre-test and a post-test using matched data, our null hypothesis is that the difference between the data sets will be zero. In other words, under our null hypothesis we would expect there to be some fluctuations between the pre- and post-tests, but nothing of significance.

\begin{align*}H_0 : m & = 0 \\ H_a : m & \neq 0\end{align*}

With the sign test, we set criterion for rejecting the null hypothesis in the same way as we did when we were testing hypotheses using parametric tests. For the example above, if we set \begin{align*}\alpha=.05\end{align*} we would have critical values set at \begin{align*}2.37\end{align*} standard scores above and below the mean. Since our standard score of \begin{align*}1.06\end{align*} does not exceed the critical value of \begin{align*}2.37\end{align*}, we would fail to reject the null hypothesis and cannot conclude that there is a significant difference between the pre- and the post-test scores.

**Using the Sign Test to Evaluate a Hypothesis about a Median of a Population**

In addition to using the sign test to calculate standard scores and evaluate a hypothesis, we can also use it as a quick and dirty way to estimate the probability of obtaining a certain number of successes or positives if there was no difference between the observations in the matched data set. When we use the sign test to evaluate a hypothesis about a median of a population, we are estimating the likelihood or the *probability* that the number of successes would occur by chance if there was no difference between pre- and post-test data. Therefore, we can test these types of hypotheses using the sign test by either (1) conducting an exact test using the binomial distribution when working with small samples or (2) calculating a test statistic when working with larger samples as demonstrated in the section above.

When working with small samples, the sign test is actually the binomial test with the null hypothesis that the proportion of successes will equal \begin{align*}0.5\end{align*}. So how do these tests differ? While we use the same formula to calculate probabilities, the sign test is a specific type of test that has its own tables and formulas. These tools apply only to the case where the null hypothesis that the proportion of successes will equal \begin{align*}0.5\end{align*} and not to the more general binomial test.

As a reminder, the formula for the binomial distribution is:

\begin{align*}P(r) = \frac{N!} {r!(N - r)!} p^r (1 - p)^{N - r}\end{align*}

where:

\begin{align*}P(r) =\end{align*} the probability of exactly r successes

\begin{align*}N =\end{align*} the number of observations

\begin{align*}p =\end{align*} the probability of success on one trial

Say that a physical education teacher is interested on the effect of a certain weight training program on students’ strength. She measures the number of times students are able to lift a dumbbell of a certain weight before the program and then again after the program. Below are her results:

Before Program | After Program | Change |
---|---|---|

\begin{align*}12\end{align*} | \begin{align*}21\end{align*} | \begin{align*}+\end{align*} |

\begin{align*}9\end{align*} | \begin{align*}16\end{align*} | \begin{align*}+\end{align*} |

\begin{align*}11\end{align*} | \begin{align*}14\end{align*} | \begin{align*}+\end{align*} |

\begin{align*}21\end{align*} | \begin{align*}36\end{align*} | \begin{align*}+\end{align*} |

\begin{align*}17\end{align*} | \begin{align*}28\end{align*} | \begin{align*}+\end{align*} |

\begin{align*}22\end{align*} | \begin{align*}20\end{align*} | \begin{align*}-\end{align*} |

18 | 29 | \begin{align*}+\end{align*} |

\begin{align*}11\end{align*} | \begin{align*}22\end{align*} | \begin{align*}+\end{align*} |

If the program had no effect, then the proportion of students with increased strength would equal \begin{align*}0.5\end{align*}. Looking at the data above, we see that \begin{align*}6\end{align*} of the \begin{align*}8\end{align*} students had increased strength after the program. But is this statistically significant? To answer this question we use the binomial formula:

\begin{align*}P(r) = \frac{N!} {r!(N - r)!} p^r (1 - p)^{N - r}\end{align*}

Using this formula, we need to determine the probability of having either \begin{align*}7\end{align*} or \begin{align*}8\end{align*} successes.

\begin{align*}P(7) & = \frac{8!} {7!(8 - 7)!} {0.5}^7 (1 - 0.5)^{8 - 7} = (8) (00391) = 0.03125 \\ P(8) & = \frac{8!} {8!(8 - 8)!} {0.5}^8 (1 - 0.5)^{8 - 8} = 0.00391\end{align*}

To determine the probability of having either \begin{align*}7\end{align*} or \begin{align*}8\end{align*} successes, we add the two probabilities together and get: \begin{align*}P(7) + P(8) = 0.03125 + 0.00391 = 0.0352\end{align*}. This states that if the program had no effect on the matched data set, we have a \begin{align*}0.0352\end{align*} likelihood of obtaining the number of successes that we did (\begin{align*}7\end{align*} out of \begin{align*}8\end{align*}) by chance.

**Using the Sign Test to Examine Categorical Data**

We can also use the sign test to examine differences and evaluate hypotheses with categorical data sets. As a reminder, we typically use the Chi-Square distribution to assess categorical data. However, because we use the sign test to assess the occurrence of a certain change (i.e. - a success, a ‘positive,’ etc.) we are not confined to using only nominal data when performing this test.

So when would using the sign test with categorical data be appropriate? We could use the sign test when determining if one categorical variable is really ‘more’ than another. For example, we could use this test if we were interested in determining if there were equal numbers of students with brown eyes and blue eyes. In addition, we could use this test to determine if equal number of males and females get accepted to a four-year college.

When using the sign test to examine a categorical data set and evaluate a hypothesis, we use the same formulas and methods as if we were using nominal data. The only major difference is that instead of labeling the observations as ‘positives’ or ‘negatives,’ we would label the observations as whatever dichotomy we would want to use (male/female, brown/blue, etc.) and calculate the test statistic or probability accordingly. Again, we would not count zero or equal observations.

**Example:**

The UC admissions committee is interested in determining if the number of males and females that are accepted into four-year colleges differs significantly. They take a random sample of \begin{align*}200\end{align*} graduating high school seniors who have been accepted to four-year colleges. Out of these \begin{align*}200\end{align*} students they find that there are \begin{align*}134\end{align*} females and \begin{align*}66\end{align*} males. Do the numbers of males and females accepted into colleges differ significantly? Since we have a large sample, please calculate the \begin{align*}z\end{align*}-score and use a \begin{align*}\alpha=.05\end{align*}.

**Solution:**

To solve this question using the sign test, we would first establish our null and alternative hypotheses:

\begin{align*}Ho : m & = 0 \\ Ha : m & \neq 0\end{align*}

This null hypothesis states that the median number of males and females accepted into UC schools is equal.

Next, we use a \begin{align*}\alpha=.05\end{align*} to establish our critical values. Using the normal distribution chart, we find that our critical values are equal to \begin{align*}1.96\end{align*} standard scores above and below the mean.

To calculate our test statistic, we use the formula:

\begin{align*}z = |\#\ \text{of positive obs}. - \#\ \text{of negative obs}.| - 1 / {\sqrt{n}}\end{align*}

However, instead of the number of positive and negative observations, we substitute the number of females and the number of males. Because we are calculating the absolute value of the difference, the order of the variables does not matter. Therefore:

\begin{align*}z = |\#\ \text{of positive obs}. - \#\ \text{of negative obs}.| - 1 / {\sqrt{n}} = \frac{|134 - 66| - 1} {\sqrt{200}} \approx 4.74\end{align*}

With a calculated test statistic of \begin{align*}4.74\end{align*}, we can reject the null hypothesis and conclude that there *is* a difference between the number of graduating males and the number of graduating females accepted into the UC schools.

## The Benefit of Using the Sign Rank Test

As previously mentioned, the sign test is a quick and dirty way to test if there is a difference between pre- and post-test matched data. When we use the sign test we simply analyze the number of observations in which there is a difference. However, the sign test does not assess the magnitude of these differences.

A more useful test that assesses the difference in size between the observations in a matched pair is the **sign rank** test. The sign rank test (also known as the Wilcoxon Sign Rank Test) resembles the sign test, but is much more sensitive. Similar to the sign test, the sign rank test is also a nonparametric alternative to the paired Student’s \begin{align*}t\end{align*}-test. When we perform this test with large samples, it is almost as sensitive as the Student’s \begin{align*}t\end{align*}-test. When we perform this test with small samples, the test is actually more sensitive than the Student’s \begin{align*}t\end{align*}-test.

The main difference with the sign rank test is that under this test the hypothesis states that the difference between observations in each data pair (pre- and post-test) is equal to zero. Essentially the null hypothesis states that the two variables have identical distributions. The sign rank test is much more sensitive than the sign test since it measures the difference between matched data sets. Therefore, it is important to note that the results from the sign and the sign rank test could be different for the same data set.

To conduct the sign rank test, we first rank the differences between the observations in each matched pair without regard to the sign of the difference. After this initial ranking, we affix the original sign to the rank numbers. All equal observations get the same rank and are ranked with the mean of the rank numbers that would have been assigned if they had varied. After this ranking, we sum the ranks in each sample and then determine the total number of observations. Finally, the one sample z-statistic is calculated from the signed ranks. For large samples, the z-statistic is compared to percentiles of the standard normal distribution.

It is important to remember that the sign rank test is more precise and sensitive than the sign test. However, since we are ranking the nominal differences between variables, we are not able to use the sign rank test to examine the differences between categorical variables. In addition, this test can be a bit more time consuming to conduct since the figures cannot be calculated directly in Excel or with a calculator.

## Lesson Summary

- We use non-parametric tests when the assumptions of normality and homogeneity of variance are not met.
- There are several different non-parametric tests that we can use in lieu of their parametric counterparts. These tests include the sign test, the sign ranks test, the ranks-sum test, the Kruskal-Wallis test and the runs test.
- The sign test examines the difference in the medians of matched data sets. When testing hypotheses using the sign test, we can either calculate the standard \begin{align*}z\end{align*}-score when working with large samples or use the binomial formula when working with small samples.
- We can also use the sign test to examine differences and evaluate hypotheses with categorical data sets.
- A more precise test that assesses the difference in size between the observations in a matched pair is the sign rank test.

### My Notes/Highlights Having trouble? Report an issue.

Color | Highlighted Text | Notes |
---|---|---|

Show More |