# 1.2: An Overview of Data

**At Grade**Created by: CK-12

## Learning Objectives

- Understand the difference between the levels of measurement: nominal, ordinal, interval, and ratio.
- Identify the general elements that characterize a study.
- Understand the fundamentals of experimental design.
- Understand the basic concept of measures of center and variation and their uses for statistical analysis.

## Introduction

This lesson is an overview of the basic considerations involved with collecting and analyzing data. All of these concepts will be examined in greater detail in later chapters, but it is important that students are familiar with the ideas before examining them in greater detail.

## Levels of Measurement

In the first lesson, you learned about the different types of variables that statisticians use to describe the characteristics of a population. Some researchers and social scientists use a more detailed distinction when examining the information that is collected for a variable, called the **levels of measurement**. This widely accepted (though not universally used) theory was first proposed by the American psychologist, Stanley Smith Stevens in 1946 (see links at end of this section). According to Stevens’ theory, the four levels of measurement are:

- nominal
- ordinal
- interval
- ratio

Each of these four levels refers to the relationship between the values of the variable.

### Nominal Measurement

It is easiest to think of nominal measurement in terms of discrete, categorical variables. This is the type of measurement in which the values of the variable are names, and not numerical at all. The names of the different species of Galapagos tortoises would be a nominal measurement.

### Ordinal Measurement

This type of measurement involves collecting information in which the order is somehow significant. The name of this level is derived from the use of ordinal numbers for ranking (\begin{align*}1^{st}, 2^{nd}, 3^{rd},\end{align*} etc). If we measured the different species of tortoise from the largest population to the smallest, this would be an example of ordinal measurement. In ordinal measurement, the distance between two consecutive values does not have meaning. The \begin{align*}1^{st}\end{align*} and \begin{align*}2^{nd}\end{align*} largest tortoise populations by species may differ by a few thousand individuals, while the \begin{align*}7^{th}\end{align*} and \begin{align*}8^{th}\end{align*} may only differ by a few hundred.

### Interval Measurement

In interval measurement, we add to the ranking of ordinal measurement by collecting data in which there is significance to the distance between any two values. An example commonly cited for interval measurement is temperature (either Celsius or Fahrenheit degrees). A change of \begin{align*}1\;\mathrm{degree}\end{align*} is the same if the temperature goes from \begin{align*}0^\circ C\end{align*} to \begin{align*}1^\circ C\end{align*}, as it is when the temperature goes from \begin{align*}40^\circ C\end{align*} to \begin{align*}41^\circ C\end{align*}. Additionally, there is meaning to the values between the ordinal numbers (i.e. \begin{align*}\frac{1}{2}\end{align*} a degree can be interpreted)

### Ratio Measurement

Ratio measurement gets its name from the fact that a meaningful fraction (or ratio) can be constructed with a ratio variable. Ratio is the deepest, most meaningful level of measurement, and consequently, the most useful. A variable measured at this level not only includes the concepts of order and interval, but also adds the idea of “nothingness,” or absolute zero. In the temperature scale of the previous example, \begin{align*}0^\circ C\end{align*} is really an arbitrarily chosen number (the temperature at which water freezes) and does not represent the absence of temperature. As a result, the ratio between temperatures is relative, and \begin{align*}40^\circ C\end{align*} for example, is not really “twice” as hot as \begin{align*}20^\circ C\end{align*}. On the other hand, for the Galapagos tortoises the idea of a species having a population of \begin{align*}0\end{align*} individuals is all too real! As a result, the estimates of the populations are measured on a ratio level and a species with a population of about \begin{align*}3300\end{align*} really is approximately three times as large as one with a population near \begin{align*}1100\end{align*}.

### Comparing the Levels of Measurement

Using Stevens’ theory can help make distinctions in the type of data that the numerical/categorical classification could not. Let’s use an example from the previous section to help show how you could collect data at different levels of measurement from the same population. Assume your school wants to collect data about all the students in the school (which they frequently do):

**Nominal:** We could collect information about the students’ gender, the town or sub-division in which they live, race, or political opinions.

**Ordinal:** If we collect data about the students’ year in school, we are now ordering that data numerically (\begin{align*}9, 10, 11\end{align*} or \begin{align*}12^{th}\end{align*} grade).

**Interval:** If we gather data for students’ SAT math scores, we have interval measurement. There is no absolute \begin{align*}0\end{align*}, as SAT scores are scaled. The ratio between two scores is also meaningless (i.e. a person who scores a \begin{align*}600\end{align*} did not necessarily do “twice as well” as a student who scored a \begin{align*}300\end{align*}).

**Ratio:** Data about a student’s age, height, weight, and grades will be measured on the ratio level. In each of these cases there is an absolute zero that has real meaning. Someone who is \begin{align*}18\end{align*} really is twice as old as a \begin{align*}9\;\mathrm{year}\end{align*} old.

It is also helpful to think of the levels of measurement as building in complexity, from the most basic (nominal) to the most complex (ratio). Each higher level of measurement includes aspects of those before it. The diagram below is a useful way to visualize the different levels of measurement.

## Observational Studies

Small Ground Finch, Santa Cruz, Galapagos Islands.

Darwin's Finches

Some of the other famous residents of the Galapagos that have provided scientists with a wealth of information and opportunities for study are the so-called Darwin’s finches. Each of the numerous species of finches has developed special adaptations that allow it to survive in a particular area. There are ground finches, tree finches, cactus finches, medium-billed, small-billed, and large-billed finches, just to name a few. One particular variety has even learned to use a stick as a tool to dig for bugs. To the untrained observer, it is almost impossible to tell them all apart, and on a visit to the islands you will see them everywhere!

Two researchers from Princeton University, Peter and Rosemary Grant, spent over \begin{align*}30\;\mathrm{years}\end{align*} studying the adaptations of finches to environmental conditions on a small island in the Galapagos called Daphne Major.

Daphne Major, Galapagos Islands.

The Grants’ spent up to \begin{align*}6\;\mathrm{months}\end{align*} a year on this “rock” documenting how species with certain beak size and shape would thrive in years when vegetation that suited those species grew well, and would dramatically decrease in numbers in years when that vegetation was sparse. This type of long-term approach to collecting data by making detailed observations is called an **observational study**, and is a widely used method of gathering data. In an observational study, the researcher observes the population of interest and records the results without making an attempt to control the outcomes.

Another famous observational study in the United States is the Framingham Heart Study. Researchers have followed the lives of people from the town of Framingham Massachusetts for \begin{align*}60\;\mathrm{years}\end{align*}, and the information gathered has led to many of the current approaches to treating and preventing heart disease. This type of long-term observational study in which the same group of subjects is observed for very long periods of time is also called a **longitudinal study**.

## Experiments

The other widely used method for conducting research is called an **experiment**. In an experiment, the researcher imposes a treatment on a group of subjects in an effort to determine a “cause and effect” relationship between variables. While observational studies could appear to show a relationship between diet and heart disease, for example, there could be another factor that is actually causing an individual’s heart condition. An experiment designed to investigate this relationship might take two groups of similar subjects, impose different diets on each group of those subjects, and then record any differences in the condition of their hearts. What makes this difficult, and in some instances impossible, is that the researcher would then need to make sure that anything else that might have an influence on a subject’s heart health (e.g. exercise, genetics, stress level) is controlled, or exactly the same for each individual in the study. One of the ways that statisticians insure this control is by randomly assigning subjects and treatments, thereby using the laws of probability to help guarantee the validity of the results. Designing experiments can be difficult and costly, but they are the only way to establish meaningful and reliable cause and effect relationships. We will study the elements of designing experiments in more detail in later chapters.

## Measures of Center and Spread

Let us assume that you have collected some data on one of the various levels of measurement (nominal, ordinal, interval, or ratio) using a statistically valid procedure (observational study or experiment). How do you summarize this information? One of the most important tools for summarizing data is to display it visually, and the various methods for doing so will be covered in later chapters. If we want to use one number or value to summarize the data, we can look at where the data is centered. Data measured at different levels can be characterized by different summaries. Look back at the Tortoise data. This data was collected through an observational study. The variable “Climate Type” is a categorical variable that has been measured at the nominal level. The easiest way to summarize this variable is to identify the most common value (**mode**), which is “humid.” Variables that are measured at the ratio level, like “population density,” we might find the **average** (**mean**) or the middle number (**median**) in the data to summarize it.

Another important element of a data set is how it is spread. In the tortoise population estimate data, the numbers per species range from \begin{align*}6320\end{align*}, down to \begin{align*}1\end{align*}, or a spread of approximately \begin{align*}6,000\end{align*} tortoises. However, the population of the Alcedo tortoises is much larger than the other species, so this number might not give a true indication of how most of the other populations vary. We have other measures that might help shed some light on the spread of the typical tortoise species,such as the **interquartile range** and the **standard deviation**, which we will cover in detail in the following lessons.

## Lesson Summary

Data can be measured at different levels depending on the type of variable and amount of detail that is collected. A widely used method for categorizing the different types of measurement breaks them down into four groups. **Nominal** data is measured by classification or categories. **Ordinal** data uses numerical categories that convey a meaningful order. **Interval** measurements show order, and the spaces between the values also have significant meaning. In **ratio** measurement, the ratio between any two values has meaning because the data includes an absolute zero value.

Statisticians and researchers use two main techniques to form important conclusions about the relationships between variables. An **observational study** is when a researcher observes the subjects in the real world without manipulating them. An **experiment** is the way to establish true cause-and-effect relationships. It involves the researcher imposing some randomly assigned treatment(s) on the subjects in an effort to isolate the effect of a single variable.

In order to summarize a set of data, we often look to a single quantity to describe where it is centered. There are various measures that are used for this summary, including the **mean**, **median**, and **mode**. These will be covered in detail in later sections, but they are generally referred to as **measures of center**. Similarly, for information about how the data is spread out, we investigate **measures of spread** that include the **range**, **interquartile range**, and **standard deviation**.

## Points to Consider

- How do we summarize, display, and compare data measured at different levels?
- What are the differences between an observational study and an experiment?
- What are the advantages/disadvantages of observational studies and experiments?
- How do you determine which measure of center or spread best describes a particular data set?

## Review Questions

- In each of the following situations, identify the level(s) at which each of these measurements has been collected.
- Lois surveys her classmates about their eating preferences by asking them to rank a list of foods from least favorite to most favorite.
- Lois collects similar data, but asks each student what is their favorite thing to eat.
- In math class, Noam collects data on the Celsius temperature of his cup of coffee over a period of several minutes.
- Noam collects the same data, only this time using degrees Kelvin.

- Which of the following statements is
*not*true.- All ordinal measurements are also nominal.
- All interval measurements are also ordinal.
- All ratio measurements are also interval.
- Steven’s levels of measurement is the one theory of measurement that all researchers agree on.

- Look at Table 3 in Section 1. What is the highest level of measurement that could be correctly applied to the variable “Population Density”?
- Nominal
- Ordinal
- Interval
- Ratio

*Note:* If you are curious about the “does not apply” in the last row of Table 3, then read on! There is only one known individual Pinta tortoise, and he lives at the Charles Darwin Research station. He is affectionately known as Lonesome George. He is probably well over \begin{align*}100\;\mathrm{years}\end{align*} old and will most likely signal the end of the species, as attempts to breed have been unsuccessful. Here is a picture of poor George!

*Lonesome George*, the Last Pinta tortoise, Charles Darwin Research Station, Santa Cruz, Galapagos Islands.

- In each of the following situations, identify if it is an observational study or an experiment.
- In an attempt to determine if students prefer bottled water to tap water, you set up a table in the cafeteria at lunchtime and have students sample some of each and ask them which they prefer.
- Researchers collect data over \begin{align*}15\;\mathrm{years}\end{align*} about \begin{align*}100\end{align*} sets of identical twins to see how their personalities develop similar or different characteristics.
- Cloned mice are put into different colored cage environments to see if there is an effect on their temperaments.
- Researchers find that babies who were exposed to lead paint have a high risk of brain damage.

## Review Answers

- Ordinal
- Nominal
- Interval. Even though Celsius has a “\begin{align*}0\end{align*}”, this is a completely arbitrary decision to set the freezing point of water and not the “absence” of temperature.
- Ratio. The Kelvin scale is based on an absolute zero, the theoretical temperature at which molecules stop moving.

- The levels of measurement theory is a useful tool to help categorize data, but like much of statistics, it is not an absolute “rule” that applies easily to every situation and several statisticians have pointed out some of the difficulties with the theory. See: http://en.wikipedia.org/wiki/Level_of_measurement

- Population densities are certainly measured up to the interval level as there is meaning to the values and distance between two observations. To decide if it is measured at the ratio level, we need to establish a meaning for absolute zero. In this case, it would be \begin{align*}0\end{align*} individuals per \begin{align*}km^2\end{align*}. This is possible and indeed represents the extinct populations.

- This is an experiment as each subject is drinking both waters (the imposed treatment). However, it will have to be designed properly. Students should not know which water is bottled and which is tap (this is called a “blind” experiment) and they should be randomly assigned the order in which they drink the water. Other conditions such as the appearance, amount, and temperature would also need to be tightly controlled.
- Observational study.
- Experiment. The research is imposing a treatment (different color rooms) on the mice.
- Observational Study. It would be unacceptable to intentionally expose a baby to potentially harmful substances. The dangers of lead paint were discovered through years of careful observational studies.

## Further Reading

- Levels of Measurement: http://en.wikipedia.org/wiki/Level_of_measurement; http://www.socialresearchmethods.net/kb/measlevl.php
- Peter and Rosemary Grant: http://en.wikipedia.org/wiki/Peter_and_Rosemary_Grant
- Framingham Heart Study: http://en.wikipedia.org/wiki/Framingham_Heart_Study