5.1: The Mean
Learning Objectives
- Understand the mean of a set of numerical data.
- Compute the mean of a given set of data.
- Understand the affect of an outlier on the mean of a set of data.
- Understand the mean of a set of data as it applies to real-world situations.
Now that you have had some fun discovering what you are finding when you are looking for the mean of a set of data, it is time to actually calculate the mean of your handfuls of blocks.
The term central tendency refers to the middle, or typical, value of a set of data, which is most commonly measured by using the 3 m’s\begin{align*}-\end{align*}mean, median and mode. The mean, median, and mode are known as the measures of central tendency. In this lesson, we will explore the mean, and then we will move on to the median and the mode in the following lessons.
The mean, often called the average of a numerical set of data, is simply the sum of the data values divided by the number of values. This is also referred to as the arithmetic mean. The mean is the balance point of a distribution.
To calculate the actual mean of your handfuls of blocks, you can use the numbers that were posted on your grid paper. These posted numbers represent the number of blocks that were picked by each student in your class. Therefore, you are calculating the mean of a population, which is a collection of all elements whose characteristics are being studied. You are not calculating the mean number of some of the blocks, but you are calculating the mean number of all of the blocks. We will use the example below for our calculations:
From the grid paper, you can see that there were 30 students who posted their numbers of blocks. The total number of blocks picked by all the students can be calculated as follows:
\begin{align*}& 1 \times 2 + 2 \times 3 + 3 \times 4 + 5 \times 5 + 3 \times 6 +4 \times 7 + 3 \times 8 + 2 \times 9 + 3 \times 10 + 3 \times 11 + 1 \times 12\\ & 2+6+12+25+18+28+24+18+30+33+12=208\end{align*}
The sum of all the blocks is 208, and the mean is the number you get when you divide the sum by the number of students who placed a post-it-note on the grid paper. The mean number of blocks is, therefore, \begin{align*}\frac{208}{30} \approx 6.93\end{align*}. This means that, on average, each student picked 7 blocks from the pail.
When calculations are done in mathematics, formulas are often used to represent the steps that are being applied. The symbol \begin{align*}\sum\end{align*} means “the sum of“ and is used to represent the addition of numbers. The numbers in every question are different, so the variable \begin{align*}x\end{align*} is used to represent the numbers. To make sure that all the numbers are included, a subscript is often used to name the numbers. Therefore, the first number in the example can be represented as \begin{align*}x_1\end{align*}. The number of data values for a population is written as \begin{align*}N\end{align*}. The mean of the population is denoted by the symbol \begin{align*}\mu\end{align*}, which is pronounced "mu." The following formula represents the steps that are involved in calculating the mean of a set of data:
\begin{align*}\text{Mean} = \frac{\text{sum of the values}}{\text{the number of values}}\end{align*}
This formula can also be written using symbols:
\begin{align*}\mu=\frac{\sum x_1+x_2+x_3+ \ldots + x_n}{N}\end{align*}
You can now use the formula to calculate the mean number of blocks per student:
\begin{align*}\mu &= \frac{\sum x_1+x_2+x_3+ \ldots + x_n}{N}\\ \mu &= \frac{2+6+12+25+18+28+24+18+30+33+12}{30}\\ \mu &= \frac{208}{30}\\ \mu &\approx 6.93\end{align*}
This means that, on average, each student picked 7 blocks from the pail.
Example 1
Stephen has been working at Wendy’s for 15 months. The following numbers are the number of hours that Stephen worked at Wendy’s for each of the past 7 months:
\begin{align*}24, 24, 31, 50, 53, 66, 78\end{align*}
What is the mean number of hours that Stephen worked each month?
Solution:
Step 1: Add the numbers to determine the total number of hours he worked.
\begin{align*}24+25+33+50+53+66+78=329\end{align*}
Step 2: Divide the total by the number of months.
\begin{align*}\frac{329}{7}=47\end{align*}
The mean number of hours that Stephen worked each month was 47.
Stephen has worked at Wendy’s for 15 months, but the numbers given above are for 7 months. Therefore, this set of data represents a sample, which is a portion of the population. The formula that was used to calculate the mean of the blocks must be changed slightly to represent a sample. The mean of a sample is denoted by \begin{align*}\bar{x}\end{align*}, which is called “\begin{align*}x\end{align*} bar.”
The number of data values for a sample is written as \begin{align*}n\end{align*}. The following formula represents the steps that are involved in calculating the mean of a sample:
\begin{align*}\text{Mean} = \frac{\text{sum of the values}}{\text{the number of values}}\end{align*}
This formula can now be written using symbols:
\begin{align*}\overline{x}=\frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n}\end{align*}
You can now use the formula to calculate the mean number of hours that Stephen worked each month:
\begin{align*}\overline{x} &= \frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n}\\ \overline{x} &= \frac{24+25+33+50+53+66+78}{7}\\ \overline{x} &= \frac{329}{7}\\ \overline{x} &= 47\end{align*}
The mean number of hours that Stephen worked each month was 47.
The formulas only differ in the symbol used for the mean and the case of the variable used for the number of data values (\begin{align*}N\end{align*} or \begin{align*}n\end{align*}). The calculations are done the same way for both a population and a sample. However, the mean of a population is constant, while the mean of a sample changes from sample to sample.
Example 2
Mark operates a shuttle service that employs 8 people. Find the mean age of these workers if the ages of the 8 employees are as follows:
\begin{align*}55 \quad 63 \quad 34 \quad 59 \quad 29 \quad 46 \quad 51 \quad 41\end{align*}
Solution:
Since the data set includes the ages of all 8 employees, it represents a population. The mean age of the employees can be calculated as shown below:
\begin{align*}\mu &= \frac{\sum x_1+x_2+x_3+ \ldots + x_n}{N}\\ \mu &= \frac{55+63+34+59+29+46+51+41}{8}\\ \mu &= \frac{378}{8}\\ \mu &= 47.25\end{align*}
The mean age of all 8 employees is 47.25 years, or 47 years and 3 months.
If you were to take a sample of 3 employees from the group of 8 and calculate the mean age for these 3 workers, would the result change? Let’s use the ages 55, 29, and 46 for one sample of 3, and the ages 34, 41, and 59 for another sample of 3:
\begin{align*}\overline{x} &= \frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n} && \overline{x}=\frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n}\\ \overline{x} &= \frac{55+29+46}{3} && \overline{x}=\frac{34+41+59}{3}\\ \overline{x} &= \frac{130}{3} && \overline{x}=\frac{134}{3}\\ \overline{x} &= 43.33 && \overline{x}=44.66\end{align*}
The mean age of the first group of 3 employees is 43.33 years.
The mean age of the second group of 3 employees is 44.66 years.
The mean age for a sample of a population depends upon what values of the population are included in the sample. From this example, you can see that the mean of a population and that of a sample from the population are not necessarily the same.
Example 3
The selling prices of the last 10 houses sold in a small town are listed below:
\begin{align*}&\$125,000 \quad \$142,000 \quad \$129,500 \quad \$89,500 \quad \ \ \$105,000\\ &\$144,000 \quad \$168,300 \quad \$96,000 \quad \ \ \$182,300 \quad \$212,000\end{align*}
Calculate the mean selling price of the last 10 homes that were sold.
Solution:
The prices are those of a sample, so the mean of the prices can be calculated as follows:
\begin{align*}\overline{x} &= \frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n}\\ \overline{x} &= \frac{125,000+142,000+129,500+89,500+105,000+\$144,000+168,300+96,000+182,300+212,000}{10}\\ \overline{x} &= \frac{\$1,393,600}{10}\\ \overline{x} &= \$139,360\end{align*}
The mean selling price of the last 10 homes that were sold was $139,360.
The mean value is one of the 3 m’s and is a measure of central tendency. It is a summary statistic that gives you a description of the entire data set and is especially useful with large data sets, where you might not have the time to examine every single value. You can also use the mean to calculate further descriptive statistics, such as the variance and standard deviation. These topics will be explored in a future lesson. The mean assists you in understanding and making sense of your data, since it uses all of the values in the data set in its calculation.
When a data set is large, a frequency distribution table is often used to display the data in an organized way. A frequency distribution table lists the data values, as well as the number of times each value appears in the data set. A frequency distribution table is easy to both read and interpret.
The numbers in a frequency distribution table do not have to be put in order. To make it easier to enter the values in the table, a tally column is often inserted. Inserting a tally column allows you to account for every value in the data set, without having to continually scan the numbers to find them in the list. A slash (/) is used to represent the presence of a value in the list, and the total number of slashes will be the frequency. If a tally column is inserted, the table will consist of 3 columns, and if no tally column is inserted, the table will consist of 2 columns. Let’s examine this concept with an actual problem and data.
Example 4
60 students were asked how many books they had read over the past 12 months. The results are listed in the frequency distribution table below. Calculate the mean number of books read by each student.
Number of Books | Number of Students (Frequency) |
---|---|
0 | 1 |
1 | 6 |
2 | 8 |
3 | 10 |
4 | 13 |
5 | 8 |
6 | 5 |
7 | 6 |
8 | 3 |
Solution:
To determine the total number of books that were read by the students, each number of books must be multiplied by the number of students who read that particular number of books. Then all the products must be added to determine the total number of books read. This total number divided by 60 will tell you the mean number of books read by each student. The formula that was written to determine the mean, \begin{align*}\overline{x}=\frac{\sum x_1+x_2+x_3+ \ldots + x_n}{n}\end{align*}, does not show any multiplication of the numbers by their frequencies. However, this can be easily inserted into this formula as shown below:
\begin{align*}\overline{x}=\frac{\sum x_1f_1+x_2f_2+x_3f_3+ \ldots + x_nf_n}{f_1+f_2+f_3+ \ldots + f_n}\end{align*}
This formula will now be used to calculate the mean number of books read by each student.
\begin{align*}\overline{x} &= \frac{\sum x_1f_1+x_2f_2+x_3f_3+ \ldots + x_nf_n}{f_1+f_2+f_3+ \ldots+ f_n}\\ \overline{x} &= \frac{\sum (0)(1)+(1)(6)+(2)(8)+(3)(10)+(4)(13)+(5)(8)+(6)(5)+(7)(6)+(8)(3)}{1+6+8+10+13+8+5+6+3}\\ \overline{x} &= \frac{\sum 0+6+16+30+52+40+30+42+24}{60}\\ \overline{x} &= \frac{240}{60}\\ \overline{x} &= 4\end{align*}
The mean number of books read by each student was 4 books.
Suppose the numbers of books read by each student were randomly listed, and it was your job to determine the mean of the numbers.
\begin{align*}& 0 \quad 5 \quad 1 \quad 4 \quad 4 \quad 6 \quad 7 \quad 2 \quad 4 \quad 3 \quad 7 \quad 2 \quad 6 \quad 4 \quad 2\\ & 8 \quad 5 \quad 8 \quad 3 \quad 4 \quad 3 \quad 6 \quad 4 \quad 5 \quad 6 \quad 1 \quad 1 \quad 3 \quad 5 \quad 4\\ & 1 \quad 5 \quad 4 \quad 1 \quad 7 \quad 3 \quad 5 \quad 4 \quad 3 \quad 8 \quad 7 \quad 2 \quad 4 \quad 7 \quad 2\\ & 1 \quad 4 \quad 6 \quad 3 \quad 2 \quad 3 \quad 5 \quad 3 \quad 2 \quad 4 \quad 7 \quad 2 \quad 5 \quad 4 \quad 3\end{align*}
An alternative to entering all the numbers into a calculator would be to create a frequency distribution table like the one shown below:
Number of Books | Tally | Number of Students (Frequency) |
---|---|---|
0 | \begin{align*}|\end{align*} | 1 |
1 | \begin{align*}\cancel{||||} \ |\end{align*} | 6 |
2 | \begin{align*}\cancel{||||} \ |||\end{align*} | 8 |
3 | \begin{align*}\cancel{||||} \ \cancel{||||}\end{align*} | 10 |
4 | \begin{align*}\cancel{||||} \ \cancel{||||} \ |||\end{align*} | 13 |
5 | \begin{align*}\cancel{||||} \ |||\end{align*} | 8 |
6 | \begin{align*}\cancel{||||}\end{align*} | 5 |
7 | \begin{align*}\cancel{||||} \ |\end{align*} | 6 |
8 | \begin{align*}|||\end{align*} | 3 |
Now that the data has been organized, the numbers of books read and the numbers of students who read the books are evident. The mean can now be calculated as it was above.
Example 5
The following data shows the heights in centimeters of a group of grade 10 students:
\begin{align*}& 183 \quad 171 \quad 158 \quad 171 \quad 182 \quad 158 \quad 164 \quad 183\\ & 179 \quad 170 \quad 182 \quad 183 \quad 170 \quad 171 \quad 167 \quad 176\\ & 176 \quad 164 \quad 176 \quad 179 \quad 183 \quad 176 \quad 170 \quad 183\\ & 183 \quad 167 \quad 167 \quad 176 \quad 171 \quad 182 \quad 179 \quad 170\end{align*}
Organize the data in a frequency distribution table and calculate the mean height of the students.
Solution:
Height of Students(cm) | Tally | Number of Students (Frequency) |
---|---|---|
171 | \begin{align*}||||\end{align*} | 4 |
158 | \begin{align*}||\end{align*} | 2 |
176 | \begin{align*}\cancel{||||}\end{align*} | 5 |
182 | \begin{align*}|||\end{align*} | 3 |
164 | \begin{align*}||\end{align*} | 2 |
179 | \begin{align*}|||\end{align*} | 3 |
170 | \begin{align*}||||\end{align*} | 4 |
183 | \begin{align*}\cancel{||||} \ |\end{align*} | 6 |
167 | \begin{align*}|||\end{align*} | 3 |
\begin{align*}\overline{x} &= \frac{\sum x_1f_1+x_2f_2+x_3f_3+ \ldots + x_nf_n}{f_1+f_2+f_3+ \ldots + f_n}\\ \overline{x} &= \frac{\sum (171)(4)+(158)(2)+(176)(5)+(182)(3)+(164)(2)+(179)(3)+(170)(4)+(183)(6)+(167)(3)}{4+2+5+3+2+3+4+6+3}\\ \overline{x} &= \frac{\sum 684+316+880+546+328+537+680+1098+501}{32}\\ \overline{x} &= \frac{5570}{32} \approx 174.1 \ \text{cm}\end{align*}
The mean height of the students is approximately 174.1 cm.
The mean is often used as a summary statistic. However, it is affected by extreme values, or outliers. This means that when there are extreme values at one end of a data set, the mean is not a very good summary statistic. For example, if you were employed by a company that paid all of its employees a salary between $60,000 and $70,000, you could probably estimate the mean salary to be about $65,000. However, if you had to add in the $150,000 salary of the CEO when calculating the mean, then the value of the mean would increase greatly. It would, in fact, be the mean of the employees' salaries, but it probably would not be a good measure of the central tendency of the salaries.
Technology is a major tool that is available for you to use when doing mathematical calculations, and its use goes beyond entering numbers to perform simple arithmetic operations. For example, the TI-83 calculator can be used to determine the mean of a set of given data values. You will first learn to calculate the mean by simply entering the data values into a list and determining the mean. The second method that you will learn about utilizes the frequency table feature of the TI-83.
Example 6
Using technology, determine the mean of the following set of numbers:
\begin{align*}24, 25, 25, 25, 26, 26, 27, 27, 28, 28, 31, 32\end{align*}
Solution:
Step 1:
Step 2:
Notice that the sum of the data values is 324 \begin{align*}(\sum x = 324)\end{align*}.
Notice that the number of data values is 12 \begin{align*}\left (n=12 \right )\end{align*}.
Notice the mean of the data values is 27 \begin{align*}\left (\overline{x}=27 \right )\end{align*}.
Now we will use the same data values and use the TI-83 to create a frequency table.
Step 1:
Step 2:
Step 3:
Step 4:
Press \begin{align*}\boxed{\text{2ND}}\end{align*} \begin{align*}\boxed{\text{0}}\end{align*} to obtain the CATALOG menu of the calculator. Scroll down to the sum function and enter \begin{align*}L_3 \rightarrow\end{align*} .
You can repeat this step to determine the sum of \begin{align*}L_2 \rightarrow\end{align*} .
Now the mean of the data can be calculated as follows:
\begin{align*}\overline{x}=\frac{324}{12}=27\end{align*}
Note that not all the data values and frequencies are visible in the screenshots, but rest assured that they were all entered into the calculator.
After entering the data into L1, the frequencies into L2, and pressing \begin{align*}\boxed{\text{2ND}}\end{align*} \begin{align*}\boxed{\text{MODE}}\end{align*}, another way to solve this problem with the calculator would have been to press \begin{align*}\boxed{\text{2ND}}\end{align*} \begin{align*}\boxed{\text{STAT}}\end{align*}, go to the MATH menu, choose option 3, and enter L1 and L2 separated by a comma so that you have mean(L1, L2). Then press \begin{align*}\boxed{\text{ENTER}}\end{align*} to get the answer. This way, the calculator will do all the calculations for you.
In addition to calculating the mean for a given set of data values, you can also apply your understanding of the mean to determine other information that may be asked for in everyday problems.
Example 7
During his final season with the Cadillac Selects, Joe Sure Shot played 14 regular season basketball games and had an average of 24.5 points per game. In the first 2 playoff games, Joe scored 18 and 26 points, respectively. Determine his new average for the season.
Solution:
Step 1: Multiply the given average by 14 to determine the total number of points he had scored before the playoff games.
\begin{align*}24.5 \times 14=343\end{align*}
Step 2: Add the points from the 2 playoff games to this total.
\begin{align*}343+18+26=387\end{align*}
Step 3: Divide this new total by 16 to determine the new average.
\begin{align*}\overline{x}=\frac{387}{16} \approx 24.19\end{align*}
All of the values for the means that you have calculated so far have been for ungrouped, or listed, data. A mean can also be determined for data that is grouped, or placed in intervals. Unlike listed data, the individual values for grouped data are not available, and you are not able to calculate their sum. To calculate the mean of grouped data, the first step is to determine the midpoint of each interval, or class. These midpoints must then be multiplied by the frequencies of the corresponding classes. The sum of the products divided by the total number of values will be the value of the mean. The following example will show how the mean value for grouped data can be calculated.
Example 8
In Tim's school, there are 25 teachers. Each teacher travels to school every morning in his or her own car. The distribution of the driving times (in minutes) from home to school for the teachers is shown in the table below:
Driving Times (minutes) | Number of Teachers |
---|---|
0 to less than 10 | 3 |
10 to less than 20 | 10 |
20 to less than 30 | 6 |
30 to less than 40 | 4 |
40 to less than 50 | 2 |
The driving times are given for all 25 teachers, so the data is for a population. Calculate the mean of the driving times.
Solution:
Step 1: Determine the midpoint for each interval.
For 0 to less than 10, the midpoint is 5.
For 10 to less than 20, the midpoint is 15.
For 20 to less than 30, the midpoint is 25.
For 30 to less than 40, the midpoint is 35.
For 40 to less than 50, the midpoint is 45.
Step 2: Multiply each midpoint by the frequency for the class.
For 0 to less than 10, \begin{align*}(5)(3) = 15\end{align*}
For 10 to less than 20, \begin{align*}(15)(10) = 150\end{align*}
For 20 to less than 30, \begin{align*}(25)(6) = 150\end{align*}
For 30 to less than 40, \begin{align*}(35)(4) = 140\end{align*}
For 40 to less than 50, \begin{align*}(45)(2) = 90\end{align*}
Step 3: Add the results from Step 2 and divide the sum by 25.
\begin{align*}15+150+150+140+90 &= 545\\ \mu &= \frac{545}{25}=21.8\end{align*}
Each teacher spends a mean time of 21.8 minutes driving from home to school each morning.
To better represent the problem and its solution, a table can be drawn as follows:
Driving Times (minutes) | Number of Teachers \begin{align*}f\end{align*} | Midpoint Of Class \begin{align*}m\end{align*} | Product \begin{align*}mf\end{align*} |
---|---|---|---|
0 to less than 10 | 3 | 5 | 15 |
10 to less than 20 | 10 | 15 | 150 |
20 to less than 30 | 6 | 25 | 150 |
30 to less than 40 | 4 | 35 | 140 |
40 to less than 50 | 2 | 45 | 90 |
For the population, \begin{align*}N = 25\end{align*} and \begin{align*}\sum mf=545\end{align*}, where \begin{align*}m\end{align*} is the midpoint of the class and \begin{align*}f\end{align*} is the frequency. The mean for the population was found by dividing \begin{align*}\sum mf\end{align*} by \begin{align*}N\end{align*}. As a result, the formula \begin{align*}\mu=\frac{\sum mf}{N}\end{align*} can be written to summarize the steps used to determine the value of the mean for a set of grouped data. If the set of data represented a sample instead of a population, the process would remain the same, and the formula would be written as \begin{align*}\overline{x}=\frac{\sum mf}{n}\end{align*}.
Example 9
The following table shows the frequency distribution of the number of hours spent per week texting messages on a cell phone by 60 grade 10 students at a local high school.
Time Per Week (Hours) | Number of Students |
---|---|
0 to less than 5 | 8 |
5 to less than 10 | 11 |
10 to less than 15 | 15 |
15 to less than 20 | 12 |
20 to less than 25 | 9 |
25 to less than 30 | 5 |
Calculate the mean number of hours per week spent by each student texting messages on a cell phone. Hint: A table may be useful.
Solution:
Time Per Week (Hours) | Number of Students \begin{align*}f\end{align*} | Midpoint of Class \begin{align*}m\end{align*} | Product \begin{align*}mf\end{align*} |
---|---|---|---|
0 to less than 5 | 8 | 2.5 | 20.0 |
5 to less than 10 | 11 | 7.5 | 82.5 |
10 to less than 15 | 15 | 12.5 | 187.5 |
15 to less than 20 | 12 | 17.5 | 210.0 |
20 to less than 25 | 9 | 22.5 | 202.5 |
25 to less than 30 | 5 | 27.5 | 137.5 |
\begin{align*}\overline{x} &= \frac{\sum mf}{n}\\ \overline{x} &= \frac{20.0+82.5+187.5+210.0+202.5+137.5}{60}\\ \overline{x} &= \frac{840}{60}\\ \overline{x} &= 14\end{align*}
The mean time spent per week by each student texting messages on a cell phone is 14 hours.
Now that you have created several distribution tables for grouped data, it's time to point out that the first column of the table can be represented in another way. As an alternative to writing the interval, or class, in words, the words can be expressed as [# - #), where the front square bracket closes the class, so the first number is included in the designated interval, but the open bracket at the end does not close the class, so the last number is not included in the designated interval. Keeping this in mind, the table in Example 9 can be presented as follows:
Time Per Week (Hours) | Number of Students \begin{align*}f\end{align*} | Midpoint of Class \begin{align*}m\end{align*} | Product \begin{align*}mf\end{align*} |
---|---|---|---|
\begin{align*}[0-5)\end{align*} | 8 | 2.5 | 20.0 |
\begin{align*}[5-10)\end{align*} | 11 | 7.5 | 82.5 |
\begin{align*}[10-15)\end{align*} | 15 | 12.5 | 187.5 |
\begin{align*}[15-20)\end{align*} | 12 | 17.5 | 210.0 |
\begin{align*}[20-25)\end{align*} | 9 | 22.5 | 202.5 |
\begin{align*}[25-30)\end{align*} | 5 | 27.5 | 137.5 |
Lesson Summary
You have learned the significance of the mean as it applies to a set of numerical data. You have also learned how to calculate the mean using appropriate formulas for the given data for both a population and a sample. When the data was presented as a list of numbers, you learned how to represent the values in a frequency distribution table, and when the data was grouped, you learned how to represent the data in a distribution table with appropriate intervals, as well as how to calculate the mean of this data. The use of technology in calculating the mean was also demonstrated in this lesson.
Points to Consider
- Is the mean only used as a measure of central tendency, or is it applied to other representations of data?
- If the mean is applied to other representations of data, can its value be calculated or estimated from this representation?
- What other measures of central tendency can be used as a statistical summary when the mean is not the best measure to use?