<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

# Summary Statistics, Summarizing Univariate Distributions

## Describing single variable data with additional measures of the center and common percentiles

%
Progress
Progress
%
Summary Statistics, Summarizing Univariate Distributions

In the previous concept, you learned how to summarize data by calculating some measures of center. In this Concept, you will learn some other measures of center as well as other ways to summarize data using ranges and quartiles to name a few.

### Watch This

For a discussion of four measures of central tendency (5.0) , see American Public University, Data Distributions - Measures of a Center (6:24).

### Guidance

The mean, median and mode are only a few possible measures of center. While they are the most commonly used measures of center, it is important to be familiar with some other measures of center that are sometimes used as well.

Midrange

The midrange (sometimes called the midextreme) is found by taking the mean of the maximum and minimum values of the data set.

#### Example A

Consider the following quiz grades: 75, 80, 90, 94, and 96. The midrange would be:

$\frac{75+96}{2}= \frac{171}{2} = 85.5$

Since it is based on only the two most extreme values, the midrange is not commonly used as a measure of central tendency.

Trimmed Mean

Recall that the mean is not resistant to the effects of outliers. Many students ask their teacher to “drop the lowest grade.” The argument is that everyone has a bad day, and one extreme grade that is not typical of the rest of their work should not have such a strong influence on their mean grade. The problem is that this can work both ways; it could also be true that a student who is performing poorly most of the time could have a really good day (or even get lucky) and get one extremely high grade. We wouldn’t blame this student for not asking the teacher to drop the highest grade! Attempting to more accurately describe a data set by removing the extreme values is referred to as trimming the data. To be fair, though, a valid trimmed statistic must remove both the extreme maximum and minimum values. So, while some students might disapprove, to calculate a trimmed mean , you remove the maximum and minimum values and divide by the number of values that remain.

#### Example B

Consider the following quiz grades: 75, 80, 90, 94, 96.

A trimmed mean would remove the largest and smallest values, 75 and 96, and divide by 3.

$&\xcancel{75},80,90,94,\xcancel{96}\\&\frac{80+90+94}{3}=88$

$n\%$ Trimmed Mean

Instead of removing just the minimum and maximums in a larger data set, a statistician may choose to remove a certain percentage of the extreme values. This is called an $n\%$ trimmed mean . To perform this calculation, remove the specified percent of the number of values from the data from each end. For example, in a data set that contains 100 numbers, to calculate a 10% trimmed mean, remove 10% of the data from each end. In this simplified example, the ten smallest and the ten largest values would be discarded, and the sum of the remaining numbers would be divided by 80.

#### Example C

In real data, it is not always so straightforward. To illustrate this, let’s return to our data from the number of children in a household and calculate a 5% trimmed mean. Here is the data set:

1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6

Placing the data in order yields the following:

1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 6

Five percent of 22 values is 1.1, so we could remove one from each end (2 total), which is approximately 4.5% trimmed, or we could remove 2 numbers from each end (4 total), which is approximately 9% trimmed. Some statisticians would calculate both of these and then use proportions to find an approximation for 5%. Others might argue that 4.5% is closer, so we should use that value. For our purposes, and to stay consistent with the way we handle similar situations in later chapters, we will always opt to remove more numbers than necessary. The logic behind this is simple. You are claiming to remove 5% of the numbers. If you cannot remove exactly 5%, then you either have to remove more or fewer. We would prefer to err on the side of caution and remove at least the percentage reported. This is not a hard and fast rule and is a good illustration of how many concepts in statistics are open to individual interpretation. Some statisticians even say that the only correct answer to every question asked in statistics is, “It depends!”

Weighted Mean

The weighted mean is a method of calculating the mean where instead of each data point contributing equally to the mean, some data points contribute more than others. This could be because they appear more often or because a decision was made to increase their importance (give them more weight). The most common type of weight to use is the frequency, which is the number of times each number is observed in the data. When we calculated the mean for the children living at home, we could have used a weighted mean calculation. The calculation would look like this:

$\frac{(5)(1)+(8)(2)+(5)(3)+(2)(4)+(1)(5)+(1)(6)}{22}$

The symbolic representation of this is as follows:

$\overline{x}=\frac{\sum_{i=1}^n f_ix_i}{\sum_{i=1}^n f_i}$

where:

$x_i$ is the $i^{\text{th}}$ data point.

$f_i$ is the number of times that data point occurs.

$n$ is the number of data points.

We may be interested in other sections of the data besides the center or middle. We could be interested in some lower percentage of the data or some higher portion of the data. The following topics will explain how to look at certain portions or percentages of a data set.

Percentiles and Quartiles

A percentile is a statistic that identifies the percentage of the data that is less than the given value. The most commonly used percentile is the median. Because it is in the numeric middle of the data, half of the data is below the median. Therefore, we could also call the median the $50^{\text{th}}$ percentile. A $40^{\text{th}}$ percentile would be a value in which 40% of the numbers are less than that observation.

To check a child’s physical development, pediatricians use height and weight charts that help them to know how the child compares to children of the same age. A child whose height is in the $70^{\text{th}}$ percentile is taller than 70% of children of the same age.

Two very commonly used percentiles are the $25^{\text{th}}$ and $75^{\text{th}}$ percentiles. The median, $25^{\text{th}}$ , and $75^{\text{th}}$ percentiles divide the data into four parts. Because of this, the $25^{\text{th}}$ percentile is notated as $Q_1$ and is called the lower quartile , and the $75^{\text{th}}$ percentile is notated as $Q_3$ and is called the upper quartile . The median is a middle quartile and is sometimes referred to as $Q_2$ .

#### Example D

1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 6

Find the median, lower quartile and upper quartile.

Solution:

Recall that the median ( $50^{\text{th}}$ percentile) is 2. The quartiles can be thought of as the medians of the upper and lower halves of the data.

In this case, there are an odd number of values in each half. If there were an even number of values, then we would follow the procedure for medians and average the middle two values of each half.

#### Example E

Find the median, 1st quartile and 3rd quartile for the data set below.

Solution:

The median in this set is 90. Because it is the middle number, it is not technically part of either the lower or upper halves of the data, so we do not include it when calculating the quartiles. However, not all statisticians agree that this is the proper way to calculate the quartiles in this case. As we mentioned in the last section, some things in statistics are not quite as universally agreed upon as in other branches of mathematics. The exact method for calculating quartiles is another one of these topics. To read more about some alternate methods for calculating quartiles in certain situations, click on the subsequent link.

On the Web

### Vocabulary

Another measure of center is the midrange , which is the mean of the maximum and minimum values . In an $n\%$ trimmed mean , you remove a certain $n$ percentage of the data (half from each end) before calculating the mean. A weighted mean involves multiplying individual data values by their frequencies or percentages before adding them and then dividing by the total of the frequencies (weights).

A percentile is a data value for which the specified percentage of the data is below that value. The median is the $50^{\text{th}}$ percentile. Two well-known percentiles are the $25^{\text{th}}$ percentile, which is called the lower quartile , $Q_1$ , and the $75^{\text{th}}$ percentile, which is called the upper quartile , $Q_3$ .

### Guided Practice

For the following data set

2, 3, 6, 8, 11, 14, 15, 17, 18, 19, 20, 20, 24, 26, 27, 28, 28, 28, 32, 34, 38 39, 43

find the following values:

a) the minimum value

b) the maximum value

c) the median

d) the upper quartile

e) the lower quartile

Solutions:

a) The minimum value is 2

b) The maximum value is 43

c) Since there are 23 data points and the stem and leaf puts the data points in order, the 12th data point will be the median. This is 20.

d) The upper quartile is the median of the upper half of the data. Since there are 11 data points in the upper half, the upper quartile will be the 6th data point. The upper quartile will be 28.

e) The lower quartile will be the 6th data point in the first half of the data. The lower quartile is 14.

### Practice

For 1-4, use the following data set

2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 8, 8, 8, 9

find the following:

1. minimum and maximum
2. midrange
3. median
4. upper and lower quartiles

For 5-11, the chart below shows the data from the Galapagos tortoise preservation program with just the number of individual tortoises that were bred in captivity and reintroduced into their native habitat.

Island or Volcano Number of Individuals Repatriated
Wolf 40
Darwin 0
Alcedo 0
Sierra Negra 286
Cerro Azul 357
Santa Cruz 210
Española 1293
San Cristóbal 55
Santiago 498
Pinzón 552
Pinta 0

Figure: Approximate Distribution of Giant Galapagos Tortoises in 2004 (“Estado Actual De Las Poblaciones de Tortugas Terrestres Gigantes en las Islas Galápagos,” Marquez, Wiedenfeld, Snell, Fritts, MacFarland, Tapia, y Nanjoa, Scologia Aplicada, Vol. 3, Num. 1,2, pp. 98-11).

For this data, calculate each of the following:

1. mode
2. median
3. mean
4. a 10% trimmed mean
5. midrange
6. upper and lower quartiles
7. the percentile for the number of Santiago tortoises reintroduced
1. Why is the answer to (8) significantly higher than the answer to (7)?
2. How would you describe the difference between the midrange and the median?
3. How can we represent data visually using the various measures of center?

Technology Notes:

Calculating Medians and Quartiles on the TI-83/84 Graphing Calculator

The median and quartiles can also be calculated using a graphing calculator. You may have noticed earlier that median is available in the MATH submenu of the LIST menu (see below).

While there is a way to access each quartile individually, we will usually want them both, so we will access them through the one-variable statistics in the STAT menu.

You should still have the data in L1 and the frequencies, or weights, in L2 , so press [STAT] , and then arrow over to CALC (the left screen below) and press [ENTER] or press [1] for '1-Var Stats', which returns you to the home screen (see the middle screen below). Press [2ND][L1][,][2ND][L2][ENTER] for the data and frequency lists (see third screen). When you press [ENTER] , look at the bottom left hand corner of the screen (fourth screen below). You will notice there is an arrow pointing downward to indicate that there is more information. Scroll down to reveal the quartiles and the median (final screen below).

Remember that $Q_1$ corresponds to the $25^{\text{th}}$ percentile, and $Q_3$ corresponds to the $75^{\text{th}}$ percentile.

### Vocabulary Language: English

Lower quartile

Lower quartile

The lower quartile, also known as $Q_1$, is the median of the lower half of the data.
Maximum

Maximum

The largest number in a data set.
midrange

midrange

The midrange  is the mean of the maximum and minimum values.
Minimum

Minimum

The minimum is the smallest value in a data set.
percentile

percentile

A percentile is a data value for which the specified percentage of the data is below that value.
trimmed mean

trimmed mean

In an n% trimmed mean, you remove a certain percentage of the data (half from each end) before calculating the mean.
Upper Quartile

Upper Quartile

The upper quartile, also known as $Q_3$, is the median of the upper half of the data.
weighted mean

weighted mean

A weighted mean involves multiplying individual data values by their frequencies or percentages before adding them and then dividing by the total of the frequencies (weights).