<meta http-equiv="refresh" content="1; url=/nojavascript/"> An Introduction to Analyzing Statistical Data | CK-12 Foundation
Dismiss
Skip Navigation
You are reading an older version of this FlexBook® textbook: CK-12 Probability and Statistics - Advanced (Second Edition) Go to the latest version.

Chapter 1: An Introduction to Analyzing Statistical Data

Created by: CK-12

Chapter Outline

Chapter Summary

Part One: Multiple Choice

  1. Which of the following is true for any set of data?
    1. The range is a resistant measure of spread.
    2. The standard deviation is not resistant.
    3. The range can be greater than the standard deviation.
    4. The IQR is always greater than the range.
    5. The range can be negative.
  2. The following shows the mean number of days of precipitation by month in Juneau, Alaska:
Mean Number of Days With Precipitation > 0.1 inches
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
18 17 18 17 17 15 17 18 20 24 20 21

Source: http://www.met.utah.edu/jhorel/html/wx/climate/daysrain.html (2/06/08)

Which month contains the median number of days of rain?

(a) January

(b) February

(c) June

(d) July

(e) September

  1. Given the data 2, 10, 14, 6, which of the following is equivalent to \overline{x}?
    1. mode
    2. median
    3. midrange
    4. range
    5. none of these
  2. Place the following in order from smallest to largest. \text{I. Range} \text{II. Standard Deviation} \text{III. Variance}
    1. I, II, III
    2. I, III, II
    3. II, III, I
    4. II, I, III
    5. It is not possible to determine the correct answer.
  3. On the first day of school, a teacher asks her students to fill out a survey with their name, gender, age, and homeroom number. How many quantitative variables are there in this example?
    1. 0
    2. 1
    3. 2
    4. 3
    5. 4
  4. You collect data on the shoe sizes of the students in your school by recording the sizes of 50 randomly selected males’ shoes. What is the highest level of measurement that you have demonstrated?
    1. nominal
    2. ordinal
    3. interval
    4. ratio
  5. According to a 2002 study, the mean height of Chinese men between the ages of 30 and 65 is 164.8 cm, with a standard deviation of 6.4 cm (http://aje.oxfordjournals.org/cgi/reprint/155/4/346.pdf accessed Feb 6, 2008). Which of the following statements is true based on this study?
    1. The interquartile range is 12.8 cm.
    2. All Chinese men are between 158.4 cm and 171.2 cm.
    3. At least 75% of Chinese men between 30 and 65 are between 158.4 and 171.2 cm.
    4. At least 75% of Chinese men between 30 and 65 are between 152 and 177.6 cm.
    5. All Chinese men between 30 and 65 are between 152 and 177.6 cm.
  6. Sampling error is best described as:
    1. The unintentional mistakes a researcher makes when collecting information
    2. The natural variation that is present when you do not get data from the entire population
    3. A researcher intentionally asking a misleading question, hoping for a particular response
    4. When a drug company does its own experiment that proves its medication is the best
    5. When individuals in a sample answer a survey untruthfully
  7. If the sum of the squared deviations for a sample of 20 individuals is 277, the standard deviation is closest to:
    1. 3.82
    2. 3.85
    3. 13.72
    4. 14.58
    5. 191.82

Part Two: Open-Ended Questions

  1. Erica’s grades in her statistics classes are as follows: Quizzes: 62, 88, 82 Labs: 89, 96 Tests: 87, 99
    1. In this class, quizzes count once, labs count twice as much as a quiz, and tests count three times as much as a quiz. Determine the following:
      1. mode
      2. mean
      3. median
      4. upper and lower quartiles
      5. midrange
      6. range
    2. If Erica’s quiz grade of 62 was removed from the data, briefly describe (without recalculating) the anticipated effect on the statistics you calculated in part (a).
  2. Mr. Crunchy’s sells small bags of potato chips that are advertised to contain 12 ounces of potato chips. To minimize complaints from their customers, the factory sets the machines to fill bags with an average weight of 13 ounces. For an experiment in his statistics class, Spud goes to 5 different stores, purchases 1 bag from each store, and then weighs the contents. The weights of the bags are: 13, 18, 12, 65, 12, 87, 13, 32, and 12.93 grams.

(a) Calculate the sample mean.

(b) Complete the chart below to calculate the standard deviation of Spud’s sample.

Observed Data (x-\overline{x}) (x-\overline{x})^2
13.18
12.65
12.87
13.32
12.93
Sum of the squared deviations

(c) Calculate the variance.

(d) Calculate the standard deviation.

(e) Explain what the standard deviation means in the context of the problem.

  1. The following table includes data on the number of square kilometers of the more substantial islands of the Galapagos Archipelago. (There are actually many more islands if you count all the small volcanic rock outcroppings as islands.)
Island Approximate Area (sq. km)
Baltra 8
Darwin 1.1
Española 60
Fernandina 642
Floreana 173
Genovesa 14
Isabela 4640
Marchena 130
North Seymour 1.9
Pinta 60
Pinzón 18
Rabida 4.9
San Cristóbal 558
Santa Cruz 986
Santa Fe 24
Santiago 585
South Plaza 0.13
Wolf 1.3

Source: http://en.wikipedia.org/wiki/Gal%C3%A1pagos_Islands

(a) Calculate each of the following for the above data:

(i) mode

(ii) mean

(iii) median

(iv) upper quartile

(v) lower quartile

(vi) range

(vii) standard deviation

(b) Explain why the mean is so much larger than the median in the context of this data.

(c) Explain why the standard deviation is so large.

  1. At http://content.usatoday.com/sports/baseball/salaries/default.aspx, USA Today keeps a database of major league baseball salaries. Pick a team and look at the salary statistics for that team. Next to the average salary, you will see the median salary. If this site is not available, a web search will most likely locate similar data.

(a) Record the median and verify that it is correct by clicking on the team and looking at the salaries of the individual players.

(b) Find the other measures of center and record them.

(i) mean

(ii) mode

(iii) midrange

(iv) lower quartile

(v) upper quartile

(vi) IQR

(c) Explain the real-world meaning of each measure of center in the context of this data.

(i) mean

(ii) median

(iii) mode

(iv) midrange

(v) lower quartile

(vi) upper quartile

(vii) IQR

(d) Find the following measures of spread:

(i) range

(ii) standard deviation

(e) Explain the real-world meaning of each measure of spread in the context of this situation.

(i) range

(ii) standard deviation

(f) Write two sentences commenting on two interesting features about the way the salary data are distributed for this team.

Keywords

Bias
The systematic error in sampling is called bias.
Bimodal
When data set is clustered about two different modes, it is described as being bimodal.
Categorical variable
When a characteristic can be neatly placed into well-defined groups, or categories, that do not depend on order, it is called a categorical variable, or qualitative variable.
Census
to get accurate and complete information about all the residents of the United States to help effectively address the needs of a changing population. This is why a complete counting, or census, is only attempted every ten years.
Chebyshev's Theorem
The Probability that any random variable that lies within k standard deviations of its mean is atleast 1-\frac{1}{k^2}. It emphasizes the fact that the variance and the standard deviation measure the variability of a random variable about its mean.
Deviation
The difference between the data value and the mean
Interquartile range(IQR)
The range is a measure of the difference between the smallest and largest numbers in a data set. The interquartile range is the difference between the upper and lower quartiles.
Interval
The distance between any two values.
Interval estimate
A statistician would report the estimate of the parameter in two ways: as a point estimate (e.g., 915) and also as an interval estimate.
Levels of measurement
Some researchers and social scientists use a more detailed distinction, called the levels of measurement,
Lower quartile
The 25^{th} percentile is notated as Q_1 and is called the lower quartile,
Mean
The mean is the numerical balancing point of the data set.
Mean absolute deviation
This is a technique we use for a similar measure called the mean absolute deviation.
Median
The median is simply the middle number in an ordered set of data.
Midrange
The midrange (sometimes called the midextreme) is found by taking the mean of the maximum and minimum values of the data set.
Mode
The mode is defined as the most frequently occurring number in a data set.
n\% trimmed mean
a statistician may choose to remove a certain percentage of the extreme values. This is called an n\% trimmed mean..
Nominal
Nominal data is measured by classification or categories.
Numerical variable
how many individuals there are per square kilometer. This type of variable is called a numerical variable, or quantitative variable.
Ordinal
Ordinal data uses numerical categories that convey a meaningful order.
Outliers
Extreme values in a Dataset are referred to as outliers. The mean is affected by the presence of an outlier;
Parameter
An actual value of a population variable is called a parameter.
Percentile
A percentile is a data value for which the specified percentage of the data is below that value.
Point estimate
A statistician would report the estimate of the parameter in two ways: as a point estimate
Population
the total group being studied is called the population.
Qualitative variable
that do not depend on order, it is called a categorical variable, or qualitative variable..
Quantitative variable
quantity, of the characteristic is most important. how many individuals there are per square kilometer. This type of variable is called a numerical variable, or quantitative variable.
Range
The range is the difference between the smallest value (minimum) and the largest value (maximum) in the data.
Ratio
the estimates of the populations are measured on a ratio level,
Resistant
A statistic that is not affected by outliers is called resistant.
Sample
representative group from the population, called a sample.
Sampling error
The difference between the true parameter and the statistic obtained by sampling is called sampling error.
Standard deviation
The standard deviation is an extremely important measure of spread that is based on the mean.
Statistic
Any number that describes the individuals in a sample (length, weight, age) is called a statistic.
Trimmed mean
Recall that the mean is not resistant to the effects of outliers.
Unit
Each member of the population is called a unit.
Upper quartile
The 75^{th} percentile is notated as Q_3 and is called the upper quartile.
Variables
A researcher studying Galapagos Tortoises would be interested in collecting information about different characteristics of the tortoises. Those characteristics are called variables.
Variance
When we have the entire population, the sum of the squared deviations is divided by the population size. This value is called the variance.
Weighted mean
The weighted mean is a method of calculating the mean where instead of each data point contributing equally to the mean, some data points contribute more than others.

Image Attributions

Description

Subjects:

Grades:

Date Created:

Feb 23, 2012

Last Modified:

Apr 29, 2014
Files can only be attached to the latest version of None
Please wait...
Please wait...
Image Detail
Sizes: Medium | Original
 
CK.MAT.ENG.SE.2.Prob-&-Stats-Adv.1

Original text