# 1.1: An Introduction to Analyzing Statistical Data

## Definitions of Statistical Terminology

This Probability and Statistics Teaching Tips FlexBook is one of seven Teacher's Edition FlexBooks that accompany the CK-12 Foundation's Probability and Statistics Student Edition.

To receive information regarding upcoming FlexBooks or to receive the available Assessment and Solution Key FlexBooks for this program please write to us at teacher-requests@ck12.org.

Probability and Statistics is by far the most text based subject in mathematics. Students new to statistics get a real baptism by total immersion with the early sections as we very quickly attempt to bring them up to speed on all of the terminology that will be used. Because of the significant difference between traditional math classes and stats, some care is going to be needed to teach how to read the textbook, as well as how to read problems.

If your school has a method of taking notes, you should first consider using that method, if you aren’t already. If your school doesn’t have a prescribed method, then it’s worth checking in with the humanities and possibly the science teachers to find out if your students are being ask to use, or had been asked to use, a particular method then that would also be a good choice. If your students have not been asked to learn or use a note taking method, it will be worth your time to teach one, and enforce the use of it. Likely there will be some resistance; AP stats students are likely to be very skilled and resistant to note taking systems. However, even very skilled students can become bogged down and lost quickly with how much language and vocabulary is needed for understanding.

My favorite method is Cornell Notes, or a slightly modified version there of. Many of my past students have hated it at first, but began to believe in the value when they were able to reference and study from a concise source when the topics became more challenging. I don’t always have my students attempt to recall entire lectures or pages of reading from their notes, but the recording of key vocabulary and topics and then the summary written at a later time are very important. A PDF from the university of Cornell can be found at: http://lsc.sas.cornell.edu/Sidebars/Study_Skills_Resources/cornellsystem.pdf, as well as many other sources. Other note taking systems can be found in various academic literacy guides and texts.

Another key is being very careful about the language used in class, and making sure that words that have special meaning are reserved. As mathematicians we frequently are sloppy with our language, letting the context define if we mean the special meaning of the word or a more general meaning. One big one is variable. With continuous, discreet, quantitative, qualitative, random and all kinds of variables being very important in statistics, students can get confused quickly, especially as the treatment of these variables is slightly different than it may have been in the past.

## An Overview of Data

With the exception of a few theoretical situations, all of statistics is based around data. However, as students will see later, the importance of clear appropriately gathered data can’t be overstated. The first step is seemingly always designing the method of gathering the data. Most of the topics presented here are going to be examined in depth in later sections, so there is no need to spend lots of time teaching about the specifics of each at this time.

This is a good time to look at some of the terms and relate them to students’ experiences. Especially in contemporary times, studies and experiments are a large part of the news and popular culture. Have students find news articles citing studies and bring them in for discussion. It will especially be powerful if students choose items that have some pertinence to their lives. Studies about school violence, teen health, standardized testing and similar studies will have more meaning for students than ones on heart disease, home prices and other favorites of the media. Current events can be a huge part of the stats classroom, and it helps to make it a rich and memorable class. News articles are wonderful resources because they usually include incomplete or incorrect information when it comes to the math involved. Students can be led in a discussion of each study, what they have included, what they have omitted, and what the omitted details likely are. The examination of sample studies in the text provide the template for such examinations.

## Measures of Center

The measures of center are frequently the only statistics that a vast majority of the population will ever use. However, the treatment of these in previous classes is very incomplete, often beginning and ending with mean, median and mode. This is a great place to begin with students, as the task of a first year statistics teacher is frequently to show how common perceptions are not always meaningful in statistics. The association with the “average” being the mean is easily discredited, as is discussed in the text. A fun exercise is to look at alumni lists for either a high school or a university. If there is a super-star athlete, or a top executive on that list, the mean and median income of graduates will be quite different, and outlines clearly why it is sometimes advantageous to use poor choices in statistics to promote a particular idea, in this case saying that the “average” alumnus of a school can expect to earn well above what is realistic.

There are many more measures of center presented and each will have its appropriate place. The text give a brief nod to “It depends” as the mantra of statisticians, but at some point students are likely to struggle with the apparently arbitrary nature of statistics. The purpose of statistics is rarely to nail down truths (although with careful practice, stats can yield surprisingly close results), but to inform and give a clear picture for trends when the data set is frequently too complex to use directly. The different choices about the mean, weighted means and trimmed means shows that statisticians have choices, and there is frequently no clear direction on which to use and that is OK. In the early stages students need to stay calm and just roll with it, as much of the confusion is cleared up with practice and experience.

One key topic here is the difference between the Population Mean and Sample Mean. There isn’t much more to say about it at this point, but when we get into continuous distributions the difference between the two becomes really important. Make sure students are aware of the difference and are careful about using the correct label and terms for each.

## Measures of Spread

The second of the two major base topics for stats is Spread (the other being Center as discussed in the previous section).These two topics will return over and over again as we begin to look at different distributions. As such, these sections should not be glossed over quickly. Trying to go back and understand where variance comes from is far more difficult when you are also trying to learn about the motivation behind continuous distributions, like the normal distribution.

At the top of page 32 there is a little mention of an important idea:

Even though we are doing easy calculations, statistics is never about meaningless arithmetic and you should always be thinking about what a particular statistical measure means in the real context of the data.

Highlight it! Make a poster of it! Frequently AP stats is derided as not being as tough, or as mathematically intensive as it’s brethren AP Calculus. I am not sure where exactly this prejudice comes from, but I suspect it has a lot to do with the fact that very, very few preliminary skills from algebra and geometry are needed for success in statistics, while Calculus will expose every hole in one’s high school math experience. This does not, however, directly relate to an easier time in a mathematical sense. Statistics requires a level of attentiveness that other classes do not. There are plenty of “cookie-cutter” problems in calculus that once the correct method is chosen the following steps follow a reliable pattern and it is only a matter of following the algorithm that has been used a hundred times before. Statistics requires a new level of understanding for the nature of the question, the process and then the results. The AP exam rewards such understanding. The free response sections are graded specifically to reward conceptual understanding and deemphasize rote algorithmic procedures. IF a student were to make a small arithmetic mistake, get an incorrect value for the standard deviation, and then interpret the value correctly, the penalty is minimal. However, if a student were to make a small arithmetic mistake and get a probability of and the student left that answer as correct the penalty is substantial. The arithmetic mistake is not so bad, but not understanding that a probability greater than necessarily shows an error belies a fundamental lack of understanding of statistics.

You may notice that both this guide, and the text, are constantly mentioning “but later”. This is bad form, in my opinion, but speaks to the ease of overlooking details now with the stiff penalty of how complex topics are later. Sometimes having a former student speak at the start of a school year is helpful, not only for alerting students to challenges ahead, but also for time management, formatting of the test and other topics useful to be successful in an advanced placement class. Students will likely get tired of hearing it all from the teacher, but by the time they realize they should have been paying closer attention, it will be too late. I also tend to be ruthless with grading at this point, creating the expectation of extreme attention to detail. Later on I will relax a bit.