7.6: Student’s t-Distribution
Learning Objectives
- Use Student’s -distribution to estimate population mean interval for smaller samples.
- Understand how the shape of Student’s -distribution corresponds to the sample size (which corresponds to a measure called the “degrees of freedom.”)
Introduction
In a previous lesson you learned about the Central Limit Theorem. One of the attributes of this theorem was that the sampling distribution of sample mean will follow a normal distribution as long as the sample size is large. As the value of increases, the sampling distribution is more and more likely to follow a normal distribution. You’ve also learned that when the standard deviation of a population is known, a -score can be calculated and used with the normal distribution to evaluate probabilities with the sample mean. In real-life situations, the standard deviation of the entire population , is rarely known. Also the sample size is not always large enough to emulate a normal distribution. In fact there are often times when the sample sizes are quite small. What do you do when either one or both of these events occur?
t-Statistic
People often make decisions from data by comparing the results from a sample to some hypothesized or predetermined parameter. These decisions are referred to as tests of significance or hypothesis tests since they are used to determine whether the predetermined parameter is acceptable or should be rejected. We know that if we flip a fair coin, the probability of getting heads is . In other words, heads and tails are equally likely. Therefore, when a coin is spun, it should land heads of the time. Let’s say that a coin of questionable fairness was spun it landed heads . For these spins the sample proportion of heads is . If technology is used to determine a confidence interval to support the standard that heads should land of the time, the reasonably likely sample proportions are in the interval to . The class with , is not captured within this confidence interval. Therefore, the fairness of this coin should be questioned; or, in other words, value of as a plausible value for the proportion of times this particular coin lands heads when it is spun should be rejected. This data has provided evidence against the standard.
The object is to test the significance of the difference between the sample and the parameter. If the difference is small (as defined by some predetermined amount), then the parameter is acceptable. The statement that the proposed parameter is true is called the null hypothesis. If the difference is large and can’t reasonably be attributed to chance, then the parameter can be rejected.
When the sample size is large, reliable estimates of the mean and variance of the population from which the sample was drawn can be made. Up to this point, we have used the -score to determine the number of standard deviations a given value lays above or below the mean.
where is the sample mean, is the hypothesized mean stated in the null hypothesis , is the population standard deviation and is the sample size.
However the above formula cannot be used to determine how far a sample mean is from the hypothesized mean because the standard deviation of the population is not known. If the value of is unknown, is substituted for and for . The stands for the “test statistic,” and it is given by the formula:
where is the sample mean is the population mean, is the standard deviation of the sample and is the sample size. The population mean is unknown but an estimate for this value is used. The -test will be used to determine the difference between the sample mean and the hypothesized mean. The null hypothesis that is being tested is
So, suppose you want to see if a hypothesized mean passes a level of confidence. The corresponding confidence interval can be determined by using the graphing calculator:
Press ENTER
the number of successes in the sample and
the sample size
Press ENTER again. The confidence level will appear on the next screen. The value for can now be compared with this interval to tell us whether or not the hypothesized mean can be accepted or rejected for this level of confidence.
Example:
The masses of newly produced bus tokens are estimated to have a mean of . A random sample of tokens was removed from the production line and the mean weight of the tokens was calculated as with a standard deviation of . What is the value of the test statistic for a test to determine how the mean differs from the estimated mean?
Solution:
If the value of from the sample fits right into the middle of the distribution of constructed by assuming the null hypothesis is true, the null hypothesis is true. On the other hand, if the value of from the sample is way out in the tail of the -distribution, then there is evidence to reject the null hypothesis. Now that the distribution of is known when the null hypothesis is true, the location of this value on the distribution. The most common method used to determine this is to find a -value (observed significance level). The -value is a probability that is computed with the assumption that the null hypothesis is true.
The -value for a two-sided test is the area under the -distribution with , or , that lies above and below . This -value can be calculated by using technology.
Press 2ND [DIST] Use to select 5.tcdf (lower bound, upper bound, degrees of freedom)
This will be the total area under both tails. To calculate the area under one tail divide by .
There is only a chance of getting an absolute value of as large as or even larger than the one from this sample . The small -value tells us that the sample is inconsistent with the null hypothesis. The population mean differs from the estimated mean of .
When the -value is close to zero, there is strong evidence against the null hypothesis. When the -value is large, the result form the sample is consistent with the estimated or hypothesized mean and there is no evidence against the null hypothesis.
A visual picture of the -value can be obtained by using the graphing calculator.
This formula is similar to that used in computing the statistic with the unknown population standard deviation being substituted with the sample standard deviation.
There are numerous -distributions and all are determined by a property of a set of data called the number of degrees of freedom. The degrees of freedom refer to the number of independent observations in a set of data. When estimating a mean score from a single sample, the number of independent observations is equal to the sample size minus one. In a single sample, there are observations but only one parameter that needs to be estimated (the mean). This means that there are of freedom for estimating variability. In other words , where is the sample size. The distribution of the -statistic from samples of size would be described by a -distribution having or of freedom. Likewise, a -distribution with of freedom would be used with a sample size of .
The -score produced by this formula is associated with a unique cumulative probability which represents the chance of finding a sample mean less than or equal to , using a random sample of size . The symbol is used to represent the -score that has a cumulative probability of . If you needed the -score to have a cumulative probability of , then would be equal to or simply . This means that the -score would be written as . This value depends on the number of degrees of freedom and this value can be determined by using the table of values:
df\p | ||||||||
---|---|---|---|---|---|---|---|---|
1 | ||||||||
2 | ||||||||
3 | ||||||||
4 | ||||||||
5 | ||||||||
6 | ||||||||
7 | ||||||||
8 | ||||||||
9 | ||||||||
10 | ||||||||
11 | ||||||||
12 | ||||||||
13 | ||||||||
14 | ||||||||
15 | ||||||||
16 | ||||||||
17 | ||||||||
18 | ||||||||
19 | ||||||||
20 | ||||||||
21 | ||||||||
22 | ||||||||
23 | ||||||||
24 | ||||||||
25 | ||||||||
26 | ||||||||
27 | ||||||||
28 | ||||||||
29 | ||||||||
30 | ||||||||
inf |
From the table it can be determined that for of freedom is while for of freedom the value is .
Since the -distribution is symmetric about a mean of zero, the following statement is true.
Therefore, if then by applying the above statement
A -distribution is mound shaped, with mean and a spread that depends on the degrees of freedom. The greater the degrees of freedom, the smaller the spread. As the number of degrees of freedom increases, the -distribution approaches a normal distribution. The spread of any -distribution is greater than that of a standard normal distribution. This is due to the fact that that in the denominator of the formula has been replaced with . Since is a random quantity changing with various samples, the variability in is greater, resulting in a larger spread.
Notice in the first distribution graph the spread of the first (inner curve) is small but in the second one the both distributions are basically overlapping, so are roughly normal. This is due to the increase in the degrees of freedom.
Here are the -distributions for and for as graphed on the graphing calculator
You are now on the screen.
[Graph]
Repeat the steps to plot more than one -distribution on the same screen.
Notice the difference in the two distributions.
The one with approximates a normal curve.
The -distribution can be used with any statistic having a bell-shaped distribution. The Central Limit Theorem states the sampling distribution of a statistic will be close to normal with a large enough sample size. As a rough estimate, the Central Limit Theorem predicts a roughly normal distribution under the following conditions:
- The population distribution is normal.
- The sampling distribution is symmetric and the sample size is .
- The sampling distribution is moderately skewed and the sample size is .
- The sample size is greater than , without outliers.
The -distribution also has some unique properties. These properties are:
1. The mean of the distribution equals zero.
2. The population standard deviation is unknown.
3. The variance is equal to the degrees of freedom divided by the degrees of freedom minus . This means that the degrees of freedom must be greater than two to avoid the expression being undefined.
4. The variance is always greater than one, although it approaches as the degrees of freedom increase. This is due to the fact that as the degrees of freedom increase, the distribution is becoming more of a normal distribution.
5. Although the Student -distribution is bell-shaped, the smaller sample sizes produce a flatter curve. The distribution is not as mounded as a normal distribution and the tails are thicker. As the sample size increases and approaches , the distribution approaches a normal distribution.
6. The population is unimodal and symmetric.
Example:
Duracell manufactures batteries that the CEO claims will last under normal use. A researcher randomly selected batteries from the production line and tested these batteries. The tested batteries had a mean life span of with a standard deviation of . If the CEO’s claim were true, what is the probability that randomly selected batteries would have a life span of no more than ?
Solution:
Using the graphing calculator or a table of values, the cumulative probability is , which means that if the true life span of a battery were , there is a chance that the life span of the tested batteries would be less than or equal to days. This is not a high enough level of confidence to reject the null hypothesis and count the discrepancy as significant.
You are now on the screen.
Example:
You have just taken ownership of a pizza shop. The previous owner told you that you would save money if you bought the mozzarella cheese in a slab. Each time you purchase a slab of cheese, you weigh it to ensure that you are receiving of cheese. The results of random measurements are and . Are these differences due to chance or is the distributor giving you less cheese than you deserve?
Solution:
Begin the problem by determining the mean of the sample and the sample standard deviation.
This can be done using the graphing calculator. and .
Example:
In the example before last the test statistic for testing that the mean weight of the cheese wasn’t was computed. Find and interpret the -value.
Solution:
The test statistic computed in the example before last was . Using technology, the value is . If the mean weight of cheese is , the probability that the volume of random measurements would give a value of greater than or less than is about .
Example:
In the previous example, the -value for testing that the mean weight of cheese wasn’t was determined.
a) State the hypotheses.
b) Would the null hypothesis be rejected at the level? The level? The level?
Solution:
a) The mean weight of cheese, is .
b) Because the -value of is less than both