###
**Introduction to Data and Measurement**

In statistics, the total group being studied is called the **population**. The individuals (people, animals, or things) in the population are called **units**. The characteristics of those individuals of interest to us are called **variables**. Those variables are of two types: **numerical**, or **quantitative**, and **categorical**, or **qualitative**.

Because of the difficulties of obtaining information about all units in a population, it is common to use a small, representative subset of the population, called a **sample**. An actual value of a population variable (for example, number of tortoises, average weight of all tortoises, etc.) is called a **parameter**. An estimate of a parameter derived from a sample is called a **statistic**.

Whenever a **sample **is used instead of the entire **population**, we have to accept that our results are merely **estimates**, and therefore, have some chance of being incorrect. This is called **sampling error**.

###
**Levels of Measurement**

###
**Nominal data **is measured by classification or categories.

**Ordinal data **uses numerical categories that convey a meaningful order.

**Interval measurements **show order, and the spaces between the values also have significant meaning.

In **ratio measurement **, the ratio between any two values has meaning, because the data include an absolute zero value.

###
**Descriptive Statistics to Measure Center and Spread**

**Measures of Center:**

The **mean**, or average, is the sum of the data points divided by the total number of data points in the set. In a data set that is a sample from a population, the **sample mean**** **is denoted by \begin{align*}\overline{x}\end{align*} . The **population mean**** **is denoted by \begin{align*}\mu\end{align*} .

- In an \begin{align*}n\%\end{align*}
**trimmed mean**, you remove a certain \begin{align*}n\end{align*} percentage of the data (half from each end) before calculating the mean. -
A
**weighted mean**

The **median **is the numeric middle of a data set. If there are an odd number of data points, this middle value is easy to find. If there is an even number of data values, the median is the mean of the middle two values.

The **mode**** **is a measure of the most frequently occurring number in a data set and is most useful for categorical data and data measured at the nominal level.

Another measure of center is the **midrange**, which is the mean of the **maximum and minimum values**.

**Measures of Spread:**

The **range** is a measure of the difference between the smallest and largest numbers in a data set.

The **interquartile range **is the difference between the upper and lower quartiles.

**Variance** and **standard deviation** are two other measures of spread for a data set. They are calculated using the following formulas:

**Variance of a Sample:**

\begin{align*}s^2= \frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}\end{align*}

where:

\begin{align*}x_i\end{align*} is the \begin{align*}i^{\text{th}}\end{align*} data value.

\begin{align*}\overline{x}\end{align*} is the mean of the sample.

\begin{align*}n\end{align*} is the sample size.

**Sample Standard Deviation:**

\begin{align*}s=\sqrt{\frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}}\end{align*}

where:

\begin{align*}x_i\end{align*} is the \begin{align*}i^{\text{th}}\end{align*} data value.

\begin{align*}\overline{x}\end{align*} is the mean of the sample.

\begin{align*}n\end{align*} is the sample size.