95% confidence statement - Page 136, Section 4.4
"We are 95% confident that the true proportion of ___(parameter of interest)___ will be between ___(low value of conf. int.)___ and ___(high value of conf. int.)___."
Back to Back Stem Plots - Page 223, Section 5.6
A stem plot in which two sets of numerical data share the stems in the middle, with one set has its leaves going to the right and the other set has its leaves going to the left.
Bar Graph - Page 157, Section 5.1
A graph in which each bar shows how frequently a given category occurs. The bars can go either horizontally or vertically. Bars should be of consistent width and need to be equally spaced apart. The categories may be placed in any order along the axis.
Bias - Page 106, 115, Section 4.1, 4.2
A measurement that is repeatedly either too high or too low.
See Class Size
Bi-variate Data - Page 247, Section 6.1
Numerical data that measures two variables.
Blinded Study - Page 143, Section 4.5
A study in which the subject does not know exactly what treatment they are getting.
Block Design - Page 145, Section 4.5
A study in which subjects are divided into distinct categories with certain characteristics (for example, males and females) before being randomly assigned treatments.
Box Plot (Box and Whisker Plot) - Page 205, Section 5.5
A display in which a numerical data set is divided into quarters. The 'box' marks the middle 50% of the data and the 'whiskers' mark the upper 25% and lower 25% of the data.
Categorical Variable - Page 105, 156, Section 4.1, 5.1
Variables that can be put into categories, like favorite color, type of car you own, your sports jersey number, etc...
Census - Page 108, 113, Section 4.1, 4.2
A special type of study in which data is gathered from every single member of the population.
Center - Page 171, 182, Section 5.2, 5.3
Typically, it is the mean, median, or the mode of a data set. In a normal distribution curve the mean, median, and mode all mark the center.
Chance Behavior - Page 28, Section 2.1
Events whose outcomes are not predictable in the short term, but have long term predictability.
Class Size (Bin Width) - Page 195, Section 5.4
A consistent width that all bars on a histogram have. A quick estimation of a reasonable class size is to roughly divide the range by a value from about 7 to 10.
Coincidence - Page 267, Section 6.2
A relationship between two variables that simply occurs by chance.
Combination - Page 16, Section 1.4
An arrangement of a set of object in which the order does not matter. nCr=n!r!(n−r)!
Common Response - Page 266, Section 6.2
A situation in which two variables have similar behaviors but are actually both responding to an additional lurking variable.
Complement of an Event - Page 29, Section 2.1
The probability of an event, 'A', NOT occurring. It can be thought of the opposite of an event and can be notated as Ac or ~A.
Compound Event - Page 38, Section 2.2
An event with two or more steps such as drawing a card and then rolling a die.
Conditional Probability - Page 61, Section 2.5
The probability of a particular outcome happening assuming a certain prerequisite condition has already been met. A clue that a conditional probability is being considered is the word 'given' or the vertical bar symbol, |.
Confidence Interval - Page 135, 136, Section 4.4
The range of answers included within the margin of error. Typically, we use a 95% confidence interval meaning it is very likely (95% chance) that the parameter lies within this range.
Confounding - Page 267, Section 6.2
Occurs when two variables are related, but it is not a clear cause/effect relationship because there are other variables that are carrying influence in the situation.
Context - Page 182, Section 5.3
The specific realities of the situation we are considering. We often consider the labels and units when defining the context.
See 2-Way Table
Control Group - Page 142, Section 4.5
A group in an experiment that does not receive the actual treatment, but rather receives a placebo, a known treatment, or no treatment at all.
Convenience Sample - Page 118, Section 4.2
A biased sampling method in which data is only gathered from those individuals who are easy to ask or are conveniently located.
Correlation (r) - Page 262-265, Section 6.2
A statistic that is used to measure the strength and direction of a linear correlation whose values range from -1 to 1. The sign of the correlation (+/-) matches the sign of the slope of the regression equation.
Data - Page 104, Section 4.1
A collection of facts, measurements, or observations about a set of individuals.
Density Curve - Page 296, Section 7.1
A curve that gives a rough description of a distribution. The curve is smooth and always has an area equal to 1 whole or 100%.
Dependent Events - Page 39, Section 2.2
A situation in which one event changes the probability of another event.
Direct Cause and Effect - Page 266, Section 6.2
A situation in which one variable causes a specific effect to occur with no lurking variables.
Direction - Page 262, Section 6.2
One of three general results reported for a linear regression. It will be reported as either be positive, negative, or 0.
See Mutually Exclusive Events
Dot Plot - Page 180, Section 5.3
A simple display that places a dot above the axis for each value. There is a dot for each value, so values that occur more than once will be shown by stacked dots.
Double Blind - Page 143, Section 4.5
A study in which neither the experimenter nor the subject knows which treatment is being given.
Empirical Rule (68-95-99.7 Rule) - Page 299, Section 7.1
A rule that states that in a normal distribution, 68% of the data is located within one standard deviation from the mean, 95% of the data is located within two standard deviations from the mean, and 99.7% of the data is located within three standard deviations from the mean.
Event - Page 1, Section 1.1
Any action from which a result will be recorded or measured.
Expected Value - Page 78, Section 3.1
The average result over the long run for an event if repeated a large number of times.
Experiment - Page 108, 141, Section 4.1, 4.5
A study in which the researchers impose a treatment on the subjects.
Explanatory Variable - Page 142, 248, Section 4.5, 6.1
The x-axis variable. It can often be viewed as the 'cause' variable or the independent variable.
Factorial - Page 8, Section 1.2
A number followed by an exclamation point indicated repeated multiplication down to 1. For example, 4!=4×3×2×1.
Fair Game - Page 87, Section 3.2
A game in which neither the player nor the house has an advantage. An average player over the long run will neither gain nor lose money. In other words, the expected value of the game is the same as the cost to play the game.
Five-Number Summary - Page 212, Section 5.5
A description of data that includes the minimum, first quartile, median, third quartile, and maximum numbers which can be used to create a box plot.
Form - Page 253, Section 6.1
A general description of the pattern in a scatterplot. Typical descriptions include linear, curved, or random (no specific form).
Frequency Table - Page 157, Section 5.1
A table that shows the number of occurrences in each category.
Fundamental Counting Principle - Page 5, Section 1.2
A rule that states to find the number of outcomes for a given situation, simply multiply the number of outcomes for each individual event.
Histogram - Page 195, Section 5.4
A special bar graph for a numerical data set. In a histogram, each bar has the same width with no space between them where bars track the frequency of results in its given range.
Independent Events - Page 38, Section 2.2
Two events in which the outcome of one event does not change the probabilities for the outcome for the other event.
Individual - Page 105, Section 4.1
The subject being studied. This can be a person, an animal or an object.
Inter-Quartile Range (IQR) - Page 208, Section 5.5
The distance between the lower and upper quartiles. IQR = Q3 - Q1
Instrument of Measurement - Page 106, Section 4.1
Tool used to make measurements. Typical instruments are tools like rulers, scales, thermometers, or speedometers.
Intersection of Sets - Page 47, Section 2.3
In a Venn Diagram, it includes the results that are members of more than one group simultaneously. We use the symbol, ∩, to indicate the intersection and think of the intersection of those parts of the diagram that include both A and B.
Law of Large Numbers - Page 28, 95, Section 2.1, 3.3
A rule that states that we will eventually get closer to the theoretical probability as we greatly increase the number of times an event is repeated.
See Time Plot
Lurking Variable - Page 141, 266, Section 4.5, 6.2
An additional variable that was not taken into account in a particular situation.
Margin of Error - Page 135, Section 4.4
A range of results, often spanning from 2 standard deviations below to 2 standard deviations above the mean in which we are 95% confident that the true parameter is located. The quick method for an approximation of the margin of error for a 95% confidence interval is M.O.E=1n√.
Mean (Average) - Page 171, 297, Section 5.2, 7.1
The sum of all the numbers divided by the number of values in the data set. It is also located at the center of a normal distribution and is a good measure of center for symmetric data sets.
Median - Page 171, Section 5.2
The data result in the middle of a data list that has been organized smallest to largest. If there are two middle data values, then the median is located halfway between those two values. In a visual distribution, it marks the 50/50 area point on the graph. Use for skewed data sets.
Mode - Page 172, Section 5.2
The result that appears most frequently in a data set. It also occurs at the highest point of a density curve.
Multistage Random Sample - Page 117, Section 4.2
A sampling technique that uses randomly selected sub-groups of a population before random selection of individuals occurs.
Mutually Exclusive Events (Disjoint) - Page 47, Section 2.3
Events that cannot occur at the same time.
Negative Linear Association - Page 254, Section 6.1
A situation such that as one numerical variable increases, another numerical variable decreases.
Non-Response - Page 120, Section 4.2
A non-sampling error in which subjects do not participate or do not answer questions in a survey.
Normal Distribution Curve - Page 297-298, Section 7.1
A bell-shaped curve that describes a symmetrical data set such that the most frequent results occur near the mean and results become less frequent as you move further from the mean.
Numerical Variable - Page 105, Section 4.1
A variable that can be assigned a numerical value, such as a height, a distance, a temperatures, etc...
Observational Study - Page 108, 141, Section 4.1, 4.5
A study in which researchers do not impose a treatment on the subjects. Data is collected by watching the subjects or from information already available. (Observe but do not disturb)
Outcome - Page 1, Section 1.1
A possible result of an event.
Outlier - Page 181, 213, 252, Section 5.3, 5.5, 6.1
A value that is unusual when compared to the rest of a data set. High outliers will be greater than Q3+1.5IQR. Low outliers will be below Q1-1.5IQR.
Parallel Box Plots - Page 222, Section 5.6
Multiple box plots graphed on the same axes to compare multiple data sets.
Parameter - Page 114, Section 4.2
A value that describes a truth about a population. Sometimes, the value is unknown so a parameter is often given as a description of truth.
Permutation - Page 11, Section 1.3
A specific order or arrangement of a set of objects or items. In a permutation, the order in which the items are selected matters.
Pictograph - Page 163, Section 5.1
A bar graph that uses pictures instead of bars. These graphs can be misleading because pictures measure height and width, where bar graphs measure only height. To be effective, all the pictures used must be the same size.
Pie chart - Page 159, Section 5.1
A graph which shows each category as a part of the whole in a circle graph. Pie charts can be used if exactly 100% of the results for a particular situation are known.
Placebo - Page 143, Section 4.5
A fake treatment that is similar in appearance to the real treatment.
Placebo Effect - Page 143, Section 4.5
The placebo effect occurs when a subject starts to experience changes simply because they believe they are receiving a treatment.
Population - Page 113, Section 4.2
The entire group of individuals we are interested in.
Positive Linear Association - Page 254, Section 6.1
A situation such that as one numerical variable increases, the other numerical variable also increases.
Prime Number - Page 48, Section 2.3
A number that is divisible only by 1 and itself. Remember, 1 is not a prime number!
Probability - Page 27, Section 2.1
The likelihood of a particular outcome occurring.
Probability Model - Page 56, Section 2.4
A table that lists all outcomes of an event and their respective probabilities. The sum of all the probabilities in a probability model must equal 1.
Processing Errors - Page 121, Section 4.2
An error commonly made due to issues like poor calculations or inaccurate recording of results.
Prospective Studies - Page 141, Section 4.5
A study which follows up with study subjects in the future in an effort to see if there were any long-term effects.
Quartile 1 - Page 206, Section 5.5
The median of all the values to the left of the median. Do not include the median itself.
Quartile 3 - Page 206, Section 5.5
The median of all the values to the right of the median. Do not include the median itself.
Random Digit Table - Page 94, 128, 324, Section 3.3, 4.3, 8.1
A long list of randomly chosen digits from 0 to 9, usually generated by computer software or calculators. A table of random digits can be found in Appendix A, Part 1.
Random Event - Page 28, Section 2.1
An event for which we can not be certain of the outcome.
Random Sampling Error - Page 120, Section 4.2
Even though a sample is randomly selected, it is entirely possible that a particular result within the population will be over-represented. Larger sample sizes reduce random sampling error. The margin of error is stated with most studies to account for random sampling error.
Range - Page 172, 208, Section 5.2, 5.5
A basic description of how spread out a data set is. It is calculated by subtracting the smallest number in a data set from the largest number in the data set.
Reliability - Page 106, Section 4.1
How consistently a particular measurement technique gives the same, or nearly the same measurement.
Response Bias - Page 121, Section 4.2
Occurs when an individual responds to a survey with an incorrect or untruthful answer. This type of bias can frequently happen when questions are potentially sensitive or embarrassing.
Response Variable - Page 142, 248, Section 4.5, 6.1
The y-axis variable. It can often be thought of as the 'effect' variable or dependent variable.
Retrospective Study - Page 141, Section 4.5
A study in which information about a subject's past is used in the study.
Sample - Page 114, Section 4.2
A representative subset of a population.
Sample Space - Page 1, Section 1.1
A list of all the possible outcomes that may occur.
Sample Survey - Page 108, Section 4.1
A survey that uses a subset of the population in order to try to make predictions about the entire population.
Sampling Frame - Page 115, Section 4.2
A list of all members of a population.
Scatterplot - Page 248, Section 6.1
Graphs that represent a relationship between two numerical variables where each data point is shown as a coordinate point on a scaled grid.
SCOFD - Page 250, Section 6.1
This is used for the description of a scatterplot and stands for Strength, Context, Outliers, Form, and Direction.
Simple Random Sample (SRS) - Page 116, Section 4.2
A sample where all possible groups of a particular size are equally possible. It can be thought of as putting all members of the population in a hat and randomly drawing until the desired sample size is reached.
Simulation - Page 94, Section 3.3
A model of a real situation that can be used to make predictions about what might really happen. Often, tables of random digits are used to carry out simulations.
Skewed Distribution - Page 181, 297, Section 5.3, 7.1
A distribution in which the majority of the data is concentrated on one end of the distribution. Visually, there is a 'tail' on the side with less data and this is the direction of the skew.
SOCCS - Page 180-182, Section 5.3
A way to remember the key information to discuss for a distribution: Shape, Outliers, Center, Context, and Spread.
Spread - Page 182, Section 5.3
A way to measure variability of a data set. Common measures of spread are the range, standard deviation, and IQR.
Standard Deviation - Page 208, 298, Section 5.5, 7.1
A measure of spread relative to the mean of a data set. Use this measurement for any data set which is approximately normally distributed.
Statistic - Page 114, Section 4.2
A number that describes results from sample. This number is often used to make an approximation of the parameter.
Stem Plot - Page 184, Section 5.3
A method of organizing data that sorts the data in a visual fashion. The stem is made up of all the leading digits of a piece of data and the leaf is the final digit.
Stratified Random Sample - Page 116, Section 4.2
A sample in which the population is divided into distinct groups called strata before a random sample is chosen from each strata.
Strength - Page 251, 262, Section 6.1, 6.2
One of three measurements reported for a best-fit line that describes how closely the data matches a perfect line.
Subjects - Page 142, Section 4.5
The individuals that are being studied in an experiment.
Symmetrical Distribution - Page 181, Section 5.3
A distribution in which the left side of the distribution looks like the mirror image of the right side of the distribution.
Systematic Random Sample - Page 116, Section 4.2
A sampling method in which the first selection is made randomly and then a 'system' is used to make the remaining selections.
Theoretical Model - Page 28, Section 2.1
A model that gives a picture of exactly the frequencies of what should happen in a situation involving probability.
Theoretical Probability - Page 28, Section 2.1
A mathematical calculation of the likelihood that an event will occur.
Time Plot (Line Graph) - Page 168, Section 5.2
A graph that shows how a variable changes over time.
Tree Diagram - Page 3, 5, 55 Section 1.1, 1.2, 2.4
A visual representation of a series of events where each successive event branches off from the previous event.
Two-Way Table (Contingency Table) - Page 62, Section 2.5
A table which tracks two characteristics from a set of individuals. For example, we might track gender and grade of all the students in your high school.
Undercoverage - Page 120, Section 4.2
A sampling error in which an entire group or groups of subjects are left out or underrepresented in a study.
Union of Sets - Page 47, Section 2.3
A union includes all results that are in either one category, another category, or both categories in a Venn diagram. We use the symbol ∪ and can think of a union as either A or B (or both).
Validity - Page 106, Section 4.1
A measurement technique is valid if it is an appropriate way to collect data.
Variables - Page 105, Section 4.1
Characteristics about the individuals that the researchers might be interested in.
Venn Diagrams - Page 31, 47, Section 2.1, 2.3
Diagrams that represent outcomes using intersecting circles.
Voluntary Response Sample - Page 118, Section 4.2
A biased sampling method in which participants get to choose whether or not to participate in the survey. The bias occurs because those who are most passionate about an issue will be more likely to respond.
Wording of a Question - Page 121, Section 4.2
The wording of a question can be used to manipulate subjects to make them more likely to respond a certain way in a survey causing bias.
Z-Score - Page 307, Section 7.2
A measure of the number of standard deviations a particular data point is away from the mean in a normal distribution. If a z-score is positive, the value is larger than the mean and if it is negative, it is less than the mean.