<meta http-equiv="refresh" content="1; url=/nojavascript/">

# Introduction to Data and Measurement Issues

## A glimpse at studying different types of data from a sample to verify characteristics of a population

%
Progress
Progress
%
Measurement of Data

### Introduction to Data and Measurement

In statistics, the total group being studied is called the population. The individuals (people, animals, or things) in the population are called units. The characteristics of those individuals of interest to us are called variables. Those variables are of two types: numerical, or quantitative, and categorical, or qualitative.

Because of the difficulties of obtaining information about all units in a population, it is common to use a small, representative subset of the population, called a sample. An actual value of a population variable (for example, number of tortoises, average weight of all tortoises, etc.) is called a parameter. An estimate of a parameter derived from a sample is called a statistic.

Whenever a sample is used instead of the entire population, we have to accept that our results are merely estimates, and therefore, have some chance of being incorrect. This is called sampling error.

### Nominal data is measured by classification or categories.

Ordinal data uses numerical categories that convey a meaningful order.

Interval measurements show order, and the spaces between the values also have significant meaning.

In ratio measurement , the ratio between any two values has meaning, because the data include an absolute zero value.

### Descriptive Statistics to Measure Center and Spread

Measures of Center:

The mean, or average, is the sum of the data points divided by the total number of data points in the set. In a data set that is a sample from a population, the sample mean is denoted by $\overline{x}$ . The population mean is denoted by $\mu$ .

• In an $n\%$ trimmed mean, you remove a certain $n$ percentage of the data (half from each end) before calculating the mean.
• A weighted mean involves multiplying individual data values by their frequencies or percentages before adding them and then dividing by the total of the frequencies (weights).

The median is the numeric middle of a data set. If there are an odd number of data points, this middle value is easy to find. If there is an even number of data values, the median is the mean of the middle two values.

The mode is a measure of the most frequently occurring number in a data set and is most useful for categorical data and data measured at the nominal level.

Another measure of center is the midrange, which is the mean of the maximum and minimum values.

The range is a measure of the difference between the smallest and largest numbers in a data set.

The interquartile range is the difference between the upper and lower quartiles.

Variance and standard deviation are two other measures of spread for a data set.  They are calculated using the following formulas:

Variance of a Sample:

$s^2= \frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}$

where:

$x_i$ is the $i^{\text{th}}$ data value.

$\overline{x}$ is the mean of the sample.

$n$ is the sample size.

Sample Standard Deviation:

$s=\sqrt{\frac{\sum_{i=1}^n (x_i-\overline{x})^2}{n-1}}$

where:

$x_i$ is the $i^{\text{th}}$ data value.

$\overline{x}$ is the mean of the sample.

$n$ is the sample size.