<meta http-equiv="refresh" content="1; url=/nojavascript/"> Testing a Hypothesis for Dependent and Independent Samples | CK-12 Foundation

# 8.4: Testing a Hypothesis for Dependent and Independent Samples

Created by: CK-12

## Learning Objectives

• Identify situations that contain dependent or independent samples.
• Calculate the pooled standard deviation for two independent samples.
• Calculate the test statistic to test hypotheses about dependent data pairs.
• Calculate the test statistic to test hypotheses about independent data pairs for both large and small samples.
• Calculate the test statistic to test hypotheses about the difference of proportions between two independent samples.

## Introduction

In the previous lessons we learned about hypothesis testing for proportions, large samples and small samples. However, in the examples in those lessons only one sample was involved. In this lesson we will apply the principals of hypothesis testing to situations involving two samples.

There are many situations in everyday life where we would perform statistical analysis involving two samples. For example, suppose that we wanted to test a hypothesis about the effect of two medications on curing an illness. Or we may want to test the difference between the means of males and females on the SAT. In both of these cases, we would analyze both samples and the hypothesis would address the difference between two sample means.

In this lesson, we will identify situations with different types of samples, learn to calculate the test statistic, calculate the estimate for population variance for both samples and calculate the test statistic to test hypotheses about the difference of proportions between samples.

## Dependent and Independent Samples

When we are working with one sample, we know that we need to select a random sample from the population, measure that sample statistic and then make hypothesis about the population based on that sample. When we work with two independent samples we assume that if the samples are selected at random (or, in the case of medical research, the subjects are randomly assigned to a group), the two samples will vary only by chance and the difference will not be statistically significant. In short, when we have independent samples we assume that the scores of one sample do not affect the other.

Independent samples can occur in two scenarios:

• Testing the difference between two fixed populations by testing the differences between samples from each population. When both samples are randomly selected, we can make inferences about the populations.
• When working with subjects (people, pets, etc.), selecting a random sample and then assigning the half of the subjects to one group and half to another.

Dependent samples are a bit different. Two samples of data are dependent when each score in one sample is paired with a specific score in the other sample. In short, these types of samples are related to each other. Dependent samples can occur in two scenarios:

• A group may be measured twice such as in a pretest-posttest situation (scores on a test before and after the lesson).
• In a matched sample where each observation is matched with an observation in the other sample.

To distinguish between tests of hypotheses for independent and dependent samples, we use a different symbol for hypotheses with dependent samples. For dependent sample hypotheses, we use the delta symbol $(\delta)$ to symbolize the difference between the two samples. Therefore, in our null hypothesis we state that the difference of scores across the two measurements is equal to $0\ (\delta) = 0$ or:

$H_0: \delta=\mu_1-\mu_2=0$

## Calculating the Pooled Estimate of Population Variance

When testing a hypothesis about two independent samples, we follow a similar process as when testing one random sample. However, when computing the test statistic, we need to calculate the estimated standard error of the difference between sample means $(s_{\bar{X}{_1}} - s_{\bar{X}{_2}}$). Usually, with one sample this calculation is pretty easy since it is based on either standard deviation of the sample or the population variance. However, when calculating this statistic for two samples, it is a bit more difficult. To calculate this statistic we use the formula:

$s_{\bar {X_1} - \bar {X_2}} = \sqrt{s^2 \left (\frac{1} {n_1} + \frac{1} {n_2}\right )},$

Where $n_1$ and $n_2$ the sizes of the two samples

$s^2 =$ the pooled sample variance, which is computed as shown below

The pooled estimate of variance is found by adding the sums of the squared deviations $(s)$ around the sample means and then dividing the total by the sum of the degrees of freedom in the two samples.

Therefore, we can find this estimate by using the formula:

$s^2 = \frac{\sum (X_1 - \bar{X}_1)^2 + \sum (X_2 - \bar{X}_2)^2} {n_1 + n_2 - 2}$

Often, the top part of this formula is simplified by substituting the symbol SS for the sum of the squared deviations. Therefore, the formula often is expressed by:

$s^2 = \frac{SS_1 + SS_2} {n_1 + n_2 - 2}$

Let’s calculate this estimate using a sample set of data.

Example:

Say that we have two independent samples of student reading scores. The data are as follows:

Sample 1 Sample 2
$7$ $12$
$8$ $14$
$10$ $18$
$4$ $13$
$6$ $11$
$10$

From this sample, we can calculate a number of descriptive statistics that will help us solve for the pooled estimate of variance:

Descriptive Statistic Sample 1 Sample 2
Number $(n)$ $5$ $6$
Sum of Observations $(X)$ $35$ $78$
Mean of Observations $(\bar{X})$ $7$ $13$
Sum of Squared Deviations $(\sum_{i = 1}^n (X_i - \bar {X})^2)$ $20.0$ $40.0$

Using the formula for the pooled estimate of variance, we find that

$s^2 = \frac{SS_1 + SS_2} {n_1 + n_2 - 2} = \frac{20.0 + 40.0} {5 + 6 - 2} \approx 6.67$

We will use this information to calculate the test statistic needed to evaluate the hypotheses.

## Testing Hypotheses with Independent Samples

When testing hypotheses with two independent samples, we follow similar steps as when testing one random sample:

1. State the null and alternative hypotheses.
2. Set the criterion (critical values) for rejecting the null hypothesis.
3. Compute the test statistic.
4. Decide about the null hypothesis and interpret our results.

When stating the null hypothesis, we are assuming that there is no difference between the means of the two independent samples. Therefore, our null hypothesis in this case would be:

$H_0:\mu_1=\mu_2 && \text{or} && H_0:\mu_1-\mu_2=0$

Similar to the one-sample test, the critical values that we set to evaluate these hypotheses depend on our alpha level and our decision regarding the null hypothesis is carried out in the same manner. However, since we have two samples, we calculate the test statistic a bit differently and use the formula:

$t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)} {s_{\bar{X}{_1}} - s_{\bar{X}{_2}}}$

where:

$\bar{X}{_1} - \bar{X}{_2}=$ the difference between the sample means

$\mu_1 - \mu_2 =$ the difference between the hypothesized population means

$S_{\bar {X_1} - \bar {X_2}}=$ standard error of the difference between sample means

Let’s take a look at an example using these formulas.

Example:

The head of the English department is interested in the difference in writing scores between remedial freshman English students who are taught by different teachers. The incoming freshmen needing remedial services are randomly assigned to one of two English teachers and are given a standardized writing test after the first semester. We take a sample of eight students from one class and nine from the other. Is there a difference in achievement on the writing test between the two classes? Use a $.05$ significance level.

Solution:

First, we would generate our hypotheses based on the two samples.

$& H_0:\mu_1=\mu_2 \\& H_0:\mu_1\neq\mu_2$

For this example, we have two independent samples from the population and have a total of $17$ students that we are examining. Since our sample is so low, we use the $t$-distribution. If our samples were above $120$, we would generally use the $z$-distribution.

In this example, we have $15\;\mathrm{degrees}$ of freedom (number in the samples minus $2$) and with a $.05$ significance level and the $t$ distribution, we find that our critical values are $2.131$ standard scores above and below the mean.

To calculate the test statistic, we first need to find the pooled estimate of variance from our sample. The data from the two groups are as follows:

Sample 1 Sample 2
$35$ $52$
$51$ $87$
$66$ $76$
$42$ $62$
$37$ $81$
$46$ $71$
$60$ $55$
$55$ $67$
$53$

From this sample, we can calculate several descriptive statistics that will help us solve for the pooled estimate of variance:

Descriptive Statistic Sample 1 Sample 2
Number $(n)$ $9$ $8$
Sum of Observations $(X)$ $445$ $551$
Mean of Observations $(\bar{X})$ $49.44$ $68.875$
Sum of Standard Deviations $(\sum(X - X)^2)$ $862.22$ $1,058.88$

Therefore:

$s^2 = \frac{SS_1 + SS_2} {n_1 + n_2 -2} = \frac{892.22 + 1058.88} {9 + 8 - 2} \approx 128.07$

and the standard error of the difference of the sample means is:

$s_{\bar{X}{_1} - \bar{x}{_2}} = \sqrt{s^2 \left (\frac{1} {n_1} + \frac{1} {n_2}\right )} = \sqrt{128.07 \left (\frac{1} {9} + \frac{1} {8}\right )} \approx 5.50$

Using this information, we can finally solve for the test statistic:

$t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 -\mu_2)} {s_{\bar{X}{_1} - \bar{X}{_2}}} = \frac{(49.44 - 68.66) - (0)} {5.50} \approx -3.53$

Since the difference of $-19.22$ is $3.53$ standard errors below the hypothesized difference of the population mean (zero) and exceeds the critical value of $2.13$ standard errors below the mean, we reject the null hypothesis and conclude that there is a significant difference in the achievement of the students assigned to the different teachers.

## Testing Hypotheses about the Difference in Proportions between Two Independent Samples

Suppose we want to test if there is a difference between proportions of two independent samples. As discussed in the previous lesson, proportions are used extensively in polling and surveys, especially by people trying to predict election results. It is possible to test a hypothesis about the proportions of two independent samples by using a similar method as described above. We might perform these hypotheses tests in the following scenarios:

• When examining the proportion of children living in poverty in two different towns.
• When investigating the proportions of freshman and sophomore students who report test anxiety.
• When testing if the proportion of high school boys and girls who smoke cigarettes is equal.

In testing hypotheses about the difference in proportions of two independent samples, we state the hypotheses and set the criterion for rejecting the null hypothesis in similar ways as the other hypotheses tests. In these types of tests we set the proportions of the samples equal to each other in the null hypothesis $(H_{0}: P_1 = P_2)$ and use the appropriate standard table to determine the critical values (remember, for small samples we generally use the $t$ distribution and for samples over $120$ we generally use the $z$-distribution).

When solving for the test statistic in large samples, we use the formula:

$z = \frac{(p_1 - p_2) - (P_1 - P_2)} {s_{p_1 - p_2}}$

where:

$p_1$ and $p_2 =$ the observed sample proportions

$P_1$ and $P_2 =$ the hypothesized population proportions

$s_{p1 - p2} =$ the standard error of the difference between independent proportions

Similar to the standard error of the difference between independent samples, we need to do a bit of work to calculate the standard error of the difference between independent proportions $(s_{p1 - p2})$. To calculate this statistic, we use the formula:

$s_{p1 - p2} = \sqrt{pq \left (\frac{1} {n_1} + \frac{1} {n_2}\right )}$

where:

$p & = \frac{f_1 + f_2} {n_1 + n_2} \\q & = 1 - p\\f_1 & = \text{frequency of success in the first sample}\\f_2 & = \text{frequency of success in the second sample}$

Example:

Suppose that we are interested in finding out which particular city is more is more satisfied with the services provided by the city government. We take a survey and find the following results:

Number Satisfied City 1 City 2
Yes $122$ $84$
No $78$ $66$
Sample Size $n_1 =200$ $n_2 =150$
Proportion who said Yes $0.61$ $0.56$

Is there a statistical difference in the proportions of citizens that are satisfied with the services provided by the city government? Use a $.05$ level of significance.

Solution:

First, we establish the null and alternative hypotheses:

$& H_0: P_1 = P_2 \\& H_a: P_1\neq P_2$

Since we have a large sample size $(n > 120)$ it is probably best to use the $z$-distribution. At a $.05$ level of significance, our critical values are $1.96$ standard scores above and below the mean. To solve for the test statistic, we must first solve for the standard error of the difference between proportions.

$p & = \frac{f_1 + f_2} {n_1 + n_2} = \frac{122 + 84} {200 + 150} = \frac{206} {350} = 0.59\\s_{p_1 - p_2} & = \sqrt{pq \left (\frac{1} {n_1} + \frac{1} {n_2}\right )} = \sqrt{(0.59) (0.41) \left (\frac{1} {200} + \frac{1} {150}\right )} = 0.053$

Therefore, the test statistic is:

$z = \frac{(p_1 - p_2) - (P_1 - P_2)} {s_{p_1 - p_2}} = \frac{(0.61 - 0.56) - (0)} {0.053} = 0.94$

Since the test statistic $(z = 0.94)$ does not exceed the critical value $(1.96)$, the null hypothesis is not rejected. Therefore, we can conclude that the difference in the probabilities ($0.61$ and $0.56$) could have occurred by chance and that there is no difference in the level of satisfaction between citizens of the two cities.

## Testing Hypotheses with Dependent Samples

When testing a hypothesis about two dependent samples, we follow the same process as when testing one random sample or two independent samples:

1. State the null and alternative hypotheses.
2. Set the criterion (critical values) for rejecting the null hypothesis.
3. Compute the test statistic.
4. Decide about the null hypothesis and interpret our results.

As mentioned in the section above, our hypothesis for two dependent samples states that there is no difference between the scores across the two samples $(H_{0}:\delta=\mu_1-\mu_2=0)$. We set the criterion for evaluating the hypothesis in the same way that we do with our other examples – by first establishing an alpha level and then finding the critical values by using the $t$-distribution table.

Calculating the test statistic for dependent samples is a bit different since we are dealing with two sets of data. The test statistic that we first need calculate is $\bar{d}$, which is the difference in the means of the two samples. Therefore, $\bar{d} = \bar{X}_1 - \bar{X}_2$ where $X$ equals the mean of the sample.

We also need to know the standard error of the difference between the two samples. Since our population variance is unknown, we estimate it by first using the formula for the standard deviations of the samples:

$s_d^2 = \frac{\sum (d - \bar{d})^2} {n - 1}$

(or when simplified)

$s_d = \sqrt{\frac{\sum (d^2) - \frac{(\sum d)^2} {n}} {n - 1}}$

where:

$s_d^2 =$ sample variance

$d =$ difference between corresponding pairs within the sample

$\bar{d} =$ the difference between the means of the two samples

$n =$ the number in the sample

$s_d =$ standard deviation

With the standard deviation, we can calculate the standard error using the following formula:

$s_{\bar{d}} = \frac{s_d} {\sqrt{n}}$

After we calculate the standard error, we can use the general formula for the test statistic:

$t = \frac{\bar{d} - \delta} {s_{\bar{d}}}$

This may seem a bit confusing, but let’s take a look at an example to help clarify.

Example:

The math teacher wants to determine the effectiveness of her statistics lesson and gives a pre-test and a post-test to $9$ students in her class. Our hypothesis is that there is no difference between the means of the two samples and our alternative hypothesis is that the two means of the samples are not equal. In other words, we are testing whether or not these two samples are related or:

$H_0: \delta=\mu_1-\mu_2 = 0 \\H_0: \delta=\mu_1-\mu_2 \neq 0$

The results for the pre- and post-tests are below:

Subject Pre-test Score Post-test Score $d =$ difference $d^2$
1 $78$ $80$ $2$ $4$
2 $67$ $69$ $2$ $4$
3 $56$ $70$ $14$ $196$
4 $78$ $79$ $1$ $1$
5 $96$ $96$ $0$ $0$
6 $82$ $84$ $2$ $4$
7 $84$ $88$ $4$ $16$
8 $90$ $92$ $2$ $4$
9 $87$ $92$ $5$ $25$
Sum $718$ $750$ $32$ $254$
Mean $79.7$ $83.3$ $3.6$

Using the information from the table above, we can first solve for the standard deviation of the two samples, then the standard error of the two samples and finally the test statistic.

Standard Deviation:

$s_d = \sqrt{\frac{\sum (d^2) - \frac{(\sum d)^2} {n}} {n - 1}} = \sqrt{\frac{254 -\frac{(32)^2} {9}} {8}} \approx 4.19$

Standard Error of the Difference:

$s_{\bar{d}} = \frac{s_d} {\sqrt{n}} = \frac{4.19} {\sqrt{9}} = 1.40$

Test Statistic ($t$-Test)

$t = \frac{\bar{d} - \delta} {s_{\bar{d}}} = \frac{3.6 - 0} {1.40} \approx 2.57$

With $8\;\mathrm{degrees}$ of freedom (number of observations - 1) and a significance level of $.05$, we find our critical values to be $2.306$ standard scores above and below the mean. Since our test statistic of $2.57$ exceeds this critical value, we can reject the null hypothesis that the two samples are equal and conclude that the lesson had an effect on student achievement.

## Lesson Summary

1. In addition to testing single samples associated with a mean, we can also perform hypothesis tests with two samples. We can test two independent samples (which are samples that do not affect one another) or dependent samples which assume that the samples are related to each other.

2. When testing a hypothesis about two independent samples, we follow a similar process as when testing one random sample. However, when computing the test statistic, we need to calculate the estimated standard error of the difference between sample means which is found by using the formula:

$s_{\bar{X}{_1} - \bar{X}{_2}} = \sqrt{s^2 \left (\frac{1} {n_1} + \frac{1} {n_2}\right )}$, where $s^2=\frac{ss_1+ss_2}{n_1+n_2-2}$

3. We carry out the test of two independent samples in a similar way as the testing of one random sample. However, we use the following formula to calculate the test statistic:

$t = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)} {s_{\bar{X}{_1} - \bar{X}{_2}}}$, where $s_{\bar{X}{_1} - \bar{X}{_2}}=\sqrt{s^2(\frac{1}{n_1}+\frac{1}{n_2})}$

4. We can also test the proportions associated with two independent samples. In order to calculate the test statistic associated with two independent samples, we use the formula:

$z = \frac{(p_1 - p_2) - (P_1 - P_2)} {s_{p_1 - p_2}}$

5. We can also test the likelihood that two dependent samples are related. To calculate the test statistic for two dependent samples, we use the formula:

$t = \frac{\bar{d} - \delta} {s_{\bar{d}}}$

## Review Questions

1. In hypothesis testing, we have scenarios that have both dependent and independent samples. Give an example of an experiment with (1) dependent samples and (2) independent samples.
2. True or False: When we test the difference between the means of males and females on the SAT, we are using independent samples.

A study is conducted on the effectiveness of a drug on the hyperactivity of laboratory rats. Two random samples of rats are used for the study and one group is given Drug $A$ and the other group is given Drug $B$ and the number of times that they push a lever is recorded. The following results for this test were calculated:

Drug A Drug B
$X$ $75.6$ $72.8$
$n$ $18$ $24$
$s^2$ $12.25$ $10.24$
$s$ $3.5$ $3.2$
1. Does this scenario involve dependent or independent samples? Explain.
2. What would the hypotheses be for this scenario?
3. Compute the pooled estimate for population variance.
4. Calculate the estimated standard error for this scenario.
5. What is the test statistic and at an alpha level of $.05$ what conclusions would you make about the null hypothesis?

A survey is conducted on attitudes towards drinking. A random sample of eight married couples is selected, and the husbands and wives respond to an attitude-toward-smoking scale. The scores are as follows:

Husbands Wives
$16$ $15$
$20$ $18$
$10$ $13$
$15$ $10$
$8$ $12$
$19$ $16$
$14$ $11$
$15$ $12$
1. What would be the hypotheses for this scenario?
2. Calculate the estimated standard deviation for this scenario.
3. Compute the standard error of the difference for these samples.
4. What is the test statistic and at an alpha level of $.05$ what conclusions would you make about the null hypothesis?

1. Answers are at the reviewers discretion.
2. True
3. This scenario involves independent samples since we assume that the scores of one sample do not affect the other.
4. $H_0: \mu_1 = \mu_2, H_a: \mu_1\neq \mu_2$
5. $s^2 = 11.09$
6. $s_{\bar{X}{_1} - \bar{X}{_1}} = 1.04$
7. The calculate test statistic is $2.69$, which exceeds the critical value of $t = 2.021$ scores above or below the mean. Therefore, we would reject the null hypothesis and conclude that it is highly unlikely that the difference between the means of the two samples occurred by chance.
8. $H_0: \delta =\mu_1 - \mu_2 = 0, H_a: \delta=\mu_1 - \mu_2\neq 0$
9. $s_d= 3.15$
10. $s_{\bar{d}} = 1.11$
11. The calculated test statistic is $1.13$ and with critical values set at $t = 2.365$ scores above or below the mean, we fail to reject the null hypothesis. Therefore, we can conclude that the attitudes towards drinking for married couples are dependent or related to each other.

Feb 23, 2012

Mar 17, 2014