Chapter 1: An Introduction to Analyzing Statistical Data
Chapter Outline
- 1.1. Definitions of Statistical Terminology
- 1.2. An Overview of Data
- 1.3. Measures of Center
- 1.4. Measures of Spread
Chapter Summary
Part One: Multiple Choice
- Which of the following is true for any set of data?
- The range is a resistant measure of spread.
- The standard deviation is not resistant.
- The standard deviation can be greater than the range.
- The \begin{align*}IQR\end{align*} is always greater than the range.
- The range can be negative.
- The following shows the mean number of days of precipitation by month in Juneau, Alaska in 2008:
Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec |
---|---|---|---|---|---|---|---|---|---|---|---|
18 | 17 | 18 | 17 | 17 | 15 | 17 | 18 | 20 | 24 | 20 | 21 |
Which month contains the median number of days of rain?
(a) January
(b) February
(c) June
(d) July
(e) September
- Given the data 2, 10, 14, 6, which of the following is equivalent to \begin{align*}\overline{x}\end{align*}?
- mode
- median
- midrange
- range
- none of these
- Place the following in order from smallest to largest. \begin{align*}\text{I. Range}\end{align*} \begin{align*}\text{II. Standard Deviation}\end{align*} \begin{align*}\text{III. Variance}\end{align*}
- I, II, III
- I, III, II
- II, III, I
- II, I, III
- It is not possible to determine the correct answer.
- On the first day of school, a teacher asks her students to fill out a survey with their name, gender, age, and homeroom number. How many quantitative variables are there in this example?
- 0
- 1
- 2
- 3
- 4
- You collect data on the shoe sizes of the students in your school by recording the sizes of 50 randomly selected males’ shoes. What is the highest level of measurement that you have demonstrated?
- nominal
- ordinal
- interval
- ratio
- According to a 2002 study, the mean height of Chinese men between the ages of 30 and 65 is 164.8 cm, with a standard deviation of 6.4 cm (http://aje.oxfordjournals.org/cgi/reprint/155/4/346.pdf accessed Feb 6, 2008). Which of the following statements is true based on this study?
- The interquartile range is 12.8 cm.
- All Chinese men are between 158.4 cm and 171.2 cm.
- At least 75% of Chinese men between 30 and 65 are between 158.4 and 171.2 cm.
- At least 75% of Chinese men between 30 and 65 are between 152 and 177.6 cm.
- All Chinese men between 30 and 65 are between 152 and 177.6 cm.
- Sampling error is best described as:
- The unintentional mistakes a researcher makes when collecting information
- The natural variation that is present when you do not get data from the entire population
- A researcher intentionally asking a misleading question, hoping for a particular response
- When a drug company does its own experiment that proves its medication is the best
- When individuals in a sample answer a survey untruthfully
- If the sum of the squared deviations for a sample of 20 individuals is 277, the standard deviation is closest to:
- 3.82
- 3.85
- 13.72
- 14.58
- 191.82
Part Two: Open-Ended Questions
- Erica’s grades in her statistics classes are as follows: Quizzes: 62, 88, 82 Labs: 89, 96 Tests: 87, 99
- In this class, quizzes count once, labs count twice as much as a quiz, and tests count three times as much as a quiz. Determine the following:
- mode
- mean
- median
- upper and lower quartiles
- midrange
- range
- If Erica’s quiz grade of 62 was removed from the data, briefly describe (without recalculating) the anticipated effect on the statistics you calculated in part (a).
- In this class, quizzes count once, labs count twice as much as a quiz, and tests count three times as much as a quiz. Determine the following:
- Mr. Crunchy’s sells small bags of potato chips that are advertised to contain 12 ounces of potato chips. To minimize complaints from their customers, the factory sets the machines to fill bags with an average weight of 13 ounces. For an experiment in his statistics class, Spud goes to 5 different stores, purchases 1 bag from each store, and then weighs the contents. The weights of the bags are: 13.18, 12.65, 12.87, 13.32, and 12.93 ounces.
(a) Calculate the sample mean.
(b) Complete the chart below to calculate the standard deviation of Spud’s sample.
Observed Data | \begin{align*}(x-\overline{x})\end{align*} | \begin{align*}(x-\overline{x})^2\end{align*} |
---|---|---|
13.18 | ||
12.65 | ||
12.87 | ||
13.32 | ||
12.93 | ||
Sum of the squared deviations |
(c) Calculate the variance.
(d) Calculate the standard deviation.
(e) Explain what the standard deviation means in the context of the problem.
- The following table includes data on the number of square kilometers of the more substantial islands of the Galapagos Archipelago. (There are actually many more islands if you count all the small volcanic rock outcroppings as islands.)
Island | Approximate Area (sq. km) |
---|---|
Baltra | 8 |
Darwin | 1.1 |
Española | 60 |
Fernandina | 642 |
Floreana | 173 |
Genovesa | 14 |
Isabela | 4640 |
Marchena | 130 |
North Seymour | 1.9 |
Pinta | 60 |
Pinzón | 18 |
Rabida | 4.9 |
San Cristóbal | 558 |
Santa Cruz | 986 |
Santa Fe | 24 |
Santiago | 585 |
South Plaza | 0.13 |
Wolf | 1.3 |
Source: http://en.wikipedia.org/wiki/Gal%C3%A1pagos_Islands
(a) Calculate each of the following for the above data:
(i) mode
(ii) mean
(iii) median
(iv) upper quartile
(v) lower quartile
(vi) range
(vii) standard deviation
(b) Explain why the mean is so much larger than the median in the context of this data.
(c) Explain why the standard deviation is so large.
- At http://content.usatoday.com/sports/baseball/salaries/default.aspx, USA Today keeps a database of major league baseball salaries. Pick a team and look at the salary statistics for that team. Next to the average salary, you will see the median salary. If this site is not available, a web search will most likely locate similar data.
(a) Record the median and verify that it is correct by clicking on the team and looking at the salaries of the individual players.
(b) Find the other measures of center and record them.
(i) mean
(ii) mode
(iii) midrange
(iv) lower quartile
(v) upper quartile
(vi) \begin{align*}IQR\end{align*}
(c) Explain the real-world meaning of each measure of center in the context of this data.
(i) mean
(ii) median
(iii) mode
(iv) midrange
(v) lower quartile
(vi) upper quartile
(vii) \begin{align*}IQR\end{align*}
(d) Find the following measures of spread:
(i) range
(ii) standard deviation
(e) Explain the real-world meaning of each measure of spread in the context of this situation.
(i) range
(ii) standard deviation
(f) Write two sentences commenting on two interesting features about the way the salary data are distributed for this team.
Keywords
Bias
Bimodal
Categorical variable
Census
Chebyshev's Theorem
Deviation
Interquartile range \begin{align*}(IQR)\end{align*}
Interval
Interval estimate
Levels of measurement
Lower quartile
Mean
Mean absolute deviation
Median
Midrange
Mode
\begin{align*}n\%\end{align*} trimmed mean
Nominal
Numerical variable
Ordinal
Outliers
Parameter
Percentile
Point estimate
Population
Qualitative variable
Quantitative variable
Range
Ratio
Resistant
Sample
Sampling error
Standard deviation
Statistic
Trimmed mean
Unit
Upper quartile
Variables
Variance
Weighted mean
Weighted mean