<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />
You are viewing an older version of this Concept. Go to the latest version.

# Introduction to Data and Measurement Issues

## A glimpse at studying different types of data from a sample to verify characteristics of a population

%
Progress
Progress
%
Introduction to Data and Measurement Issues

In this Concept, you will learn many definitions of statistical terminology in order to begin talking about statistics. We will demonstrate the reason for using a sample to learn about a population.

### Watch This

For an introduction to the importance of statistics, see onlinestatbook, Introduction to Statistics: Importance of Statistics (2:45).

Citation: Online Statistics Education: A Multimedia Course of Study ( http://onlinestatbook.com/). Project Leader: David M. Lane, Rice University.

For a discussion of populations and samples, as well as parameters and statitsics see onlinestatbook, Introduction to Statistics: Inferential Statistics (6:39).

### Guidance

In order to learn some basic vocabulary of statistics and learn how to distinguish between different types of variables, we will use the example of information about the Giant Galapagos Tortoise.

#### Example A

The Galapagos Islands, off the coast of Ecuador in South America, are famous for the amazing diversity and uniqueness of life they possess. One of the most famous Galapagos residents is the Galapagos Giant Tortoise, which is found nowhere else on earth. Charles Darwin’s visit to the islands in the $19^{\text{th}}$ Century and his observations of the tortoises were extremely important in the development of his theory of evolution.

The tortoises lived on nine of the Galapagos Islands, and each island developed its own unique species of tortoise. In fact, on the largest island, there are four volcanoes, and each volcano has its own species. When first discovered, it was estimated that the tortoise population of the islands was around 250,000. Unfortunately, once European ships and settlers started arriving, those numbers began to plummet. Because the tortoises could survive for long periods of time without food or water, expeditions would stop at the islands and take the tortoises to sustain their crews with fresh meat and other supplies for the long voyages. Also, settlers brought in domesticated animals like goats and pigs that destroyed the tortoises' habitat. Today, two of the islands have lost their species, a third island has no remaining tortoises in the wild, and the total tortoise population is estimated to be around 15,000. The good news is there have been massive efforts to protect the tortoises. Extensive programs to eliminate the threats to their habitat, as well as breed and reintroduce populations into the wild, have shown some promise.

Approximate distribution of Giant Galapagos Tortoises in 2004, Estado Actual De Las Poblaciones de Tortugas Terrestres Gigantes en las Islas Galápagos, Marquez, Wiedenfeld, Snell, Fritts, MacFarland, Tapia, y Nanjoa, Scologia Aplicada, Vol. 3, Num. 1,2, pp. 98 11.

Island or Volcano Species Climate Type Shell Shape Estimate of Total Population Population Density (per $km^2$ ) Number of Individuals Repatriated $^*$
Wolf becki semi-arid intermediate 1139 228 40
Darwin microphyes semi-arid dome 818 205 0
Alcedo vanden- burghi humid dome 6,320 799 0
Sierra Negra guntheri humid flat 694 122 286
Cerro Azul vicina humid dome 2.574 155 357
Santa Cruz nigrita humid dome 3,391 730 210
Española hoodensis arid saddle 869 200 1,293
San Cristóbal chathamen- sis semi-arid dome 1,824 559 55
Santiago darwini humid intermediate 1,165 124 498
Pinzón ephippium arid saddle 532 134 552
Pinta abingdoni arid saddle 1 Does not apply 0

$^*$ Repatriation is the process of raising tortoises and releasing them into the wild when they are grown to avoid local predators that prey on the hatchlings.

Classifying Variables

Statisticians refer to an entire group that is being studied as a population . Each member of the population is called a unit . In this example, the population is all Galapagos Tortoises, and the units are the individual tortoises. It is not necessary for a population or the units to be living things, like tortoises or people. For example, an airline employee could be studying the population of jet planes in her company by studying individual planes.

A researcher studying Galapagos Tortoises would be interested in collecting information about different characteristics of the tortoises. Those characteristics are called variables . Each column of the previous figure contains a variable. In the first column, the tortoises are labeled according to the island (or volcano) where they live, and in the second column, by the scientific name for their species. When a characteristic can be neatly placed into well-defined groups, or categories, that do not depend on order, it is called a categorical variable , or qualitative variable .

The last three columns of the previous figure provide information in which the count, or quantity, of the characteristic is most important. We are interested in the total number of each species of tortoise, or how many individuals there are per square kilometer. This type of variable is called a numerical variable , or quantitative variable .

#### Example B

Determine whether each of the variables Climate Type, Shell Shape, Number of Tagged Individuals , and Number of Individuals Repatriated are numerical or categorical variables.

Variable Explanation Type
Climate Type Many of the islands and volcanic habitats have three distinct climate types. Categorical
Shell Shape Over many years, the different species of tortoises have developed different shaped shells as an adaptation to assist them in eating vegetation that varies in height from island to island. Categorical
Number of Tagged Individuals Tortoises were captured and marked by scientists to study their health and assist in estimating the total population. Numerical
Number of Individuals Repatriated There are two tortoise breeding centers on the islands. Through these programs, many tortoises have been raised and then reintroduced into the wild. Numerical

Population vs. Sample

We have already defined a population as the total group being studied. Most of the time, it is extremely difficult or very costly to collect all the information about a population. In the Galapagos, it would be very difficult and perhaps even destructive to search every square meter of the habitat to be sure that you counted every tortoise. In an example closer to home, it is very expensive to get accurate and complete information about all the residents of the United States to help effectively address the needs of a changing population. This is why a complete counting, or census , is only attempted every ten years. Because of these problems, it is common to use a smaller, representative group from the population, called a sample .

You may recall the tortoise data included a variable for the estimate of the population size. This number was found using a sample and is actually just an approximation of the true number of tortoises. If a researcher wanted to find an estimate for the population of a species of tortoises, she would go into the field and locate and mark a number of tortoises. She would then use statistical techniques that we will discuss later in this text to obtain an estimate for the total number of tortoises in the population. In statistics, we call the actual number of tortoises a parameter . Any number that describes the individuals in a sample (length, weight, age) is called a statistic . Each statistic is an estimate of a parameter, whose value may or may not be known.

Errors in Sampling

We have to accept that estimates derived from using a sample have a chance of being inaccurate. This cannot be avoided unless we measure the entire population. The researcher has to accept that there could be variations in the sample due to chance that lead to changes in the population estimate. A statistician would report the estimate of the parameter in two ways: as a point estimate (e.g., 915) and also as an interval estimate . For example, a statistician would report: “I am fairly confident that the true number of tortoises is actually between 561 and 1075.” This range of values is the unavoidable result of using a sample, and not due to some mistake that was made in the process of collecting and analyzing the sample. The difference between the true parameter and the statistic obtained by sampling is called sampling error . It is also possible that the researcher made mistakes in her sampling methods in a way that led to a sample that does not accurately represent the true population.

#### Example C

What are some possible errors that could be involved in the study of the Galopagos tortoises?

Solution: The researcher could have picked an area to search for tortoises where a large number tend to congregate (near a food or water source, perhaps). If this sample were used to estimate the number of tortoises in all locations, it may lead to a population estimate that is too high.

This type of systematic error in sampling is called bias . Statisticians go to great lengths to avoid the many potential sources of bias. We will investigate this in more detail in a later chapter.

On the Web

Charles Darwin Research Center and Foundation: http://www.darwinfoundation.org

### Vocabulary

In statistics, the total group being studied is called the population . The individuals (people, animals, or things) in the population are called units . The characteristics of those individuals of interest to us are called variables . Those variables are of two types: numerical , or quantitative , and categorical , or qualitative .

Because of the difficulties of obtaining information about all units in a population, it is common to use a small, representative subset of the population, called a sample . An actual value of a population variable (for example, number of tortoises, average weight of all tortoises, etc.) is called a parameter . An estimate of a parameter derived from a sample is called a statistic .

Whenever a sample is used instead of the entire population , we have to accept that our results are merely estimates , and therefore, have some chance of being incorrect. This is called sampling error .

### Guided Practice

For each of the following variables, indicate whether the variable is categorical or quantitative (numerical).

a. Importance of political party affiliation to people (very, somewhat, or not very important).

c. Weights of adult men, in pounds.

d. Favorite type of book (fiction, nonfiction).

Solutions:

a. This is categorical data because the information collected will fall into one of the three categories: very, somewhat, or not very important.

b. This is measured by numbers of hours, so it is quantitative data.

c. This is measured in pounds, so it is quantitative data.

d. This is categorical data because the information collected will fall into one of the many categories: fiction, nonfiction, et cetera.

### Practice

For 1-3, identify the population, the units, and each variable, and tell if the variable is categorical or quantitative.

1. A quality control worker with Sweet-Tooth Candy weighs every $100^{\text{th}}$ candy bar to make sure it is very close to the published weight.
2. Doris decides to clean her sock drawer out and sorts her socks into piles by color.
3. A researcher is studying the effect of a new drug treatment for diabetes patients. She performs an experiment on 200 randomly chosen individuals with type II diabetes. Because she believes that men and women may respond differently, she records each person’s gender, as well as the person's change in blood sugar level after taking the drug for a month.

For 4-6, indicate for each of the following characteristics of an individual whether the variable is categorical or quantitative (numerical):

1. Length of arm from elbow to shoulder (in inches)
2. Number of DVD’s the person owns.
1. In Physical Education class, the teacher has the students count off by two’s to divide them into teams. Is this a categorical or quantitative variable?
2. A school is studying its students' test scores by grade. Explain how the characteristic 'grade' could be considered either a categorical or a numerical variable.
1. What are the best ways to display categorical and numerical data?
2. Is it possible for a variable to be considered both categorical and numerical?
3. How can you compare the effects of one categorical variable on another or one quantitative variable on another?

### Vocabulary Language: English

categorical variable

categorical variable

A categorical variable is a variable that can take on one of a limited number of values. Examples of categorical variables are tv stations, the state someone lives in, and eye color.
Estimate

Estimate

To estimate is to find an approximate answer that is reasonable or makes sense given the problem.
Numerical variables

Numerical variables

Numerical variables are quantitative variables.
parameter

parameter

An actual value of a population variable is called a parameter.
Population

Population

In statistics, the population is the entire group of interest from which the sample is drawn.
qualitative variable

qualitative variable

A qualitative variable is one that cannot be measured numerically but can be placed in a category.
quantitative variable

quantitative variable

A quantitative variable is a variable that takes on numerical values that represent a measurable quantity. Examples of quantitative variables are the height of students or the population of a city.
Sample

Sample

A sample is a specified part of a population, intended to represent the population as a whole.
Sampling error (random variation)

Sampling error (random variation)

Sampling error occurs whenever a sample is used instead of the entire population, where we have to accept that our results are merely estimates, and therefore, have some chance of being incorrect.
statistic

statistic

An estimate of a parameter derived from a sample is called a statistic.
Statistics

Statistics

Statistics is a branch of mathematics that involves collecting, analyzing and displaying data.
units

units

The individuals (people, animals, or things) in the population are called units.
variable

variable

In statistics, a variable is simply a characteristic that is being studied.