# 1.2: Visualizations of Data

## Histograms and Frequency Distributions

Statistics classes are dangerous places for students who try to outsmart math problems. You probably know the type, the kid who will always bring up some fact of why a particular study is nonsense, or provides data, or worse anecdotes, that put the class discussion off track. For example, the bottled water discussion will likely have students bringing up all kinds of strange numbers, reasoning and justification or bottled water use, or lack there of. This is also a regional aspect to this, as my former students living in unincorporated locations of the Santa Cruz mountains with untreated well water, as well as my former students from New Orleans where the Mississippi River is the source, have different considerations than my former students in San Francisco (which pipes in amazing water from Yosemite National Park). Decisions have to be made as an instructor as to how much “digression” is going to be allowed. Sometimes there are teachable moments regarding using the data presented, not bringing in personal bias, or sometimes students have valid questions about how data is collected. Students, especially those who have had success in school, really want to participate. Often in the early chapters they don’t have much to contribute to problems as it is a new subject for them. Simply dismissing students’ contributions can leave some students with a sour attitude, but certainly not everyone’s slightly off topic contributions can be entertained. I tend to try to listen, and then if the comment is not productive to my lesson, I try to let the class know what I would like to hear, and what isn’t going to help us with the ultimate goal. This becomes less of a problem as the year progresses, students learn more stats, and the topics are tougher.

Most of these topics are going to be review from previous classes. I wouldn’t spend a ton of time in class, but would use these as warm-ups. Give students a topic, like hours of TV watched daily, and have them collect the data and chart it in the first of class. Making clear and accurate graphical representations is a skill to be practiced without a ton of content. It is also OK to be critical of “artistic” skills here, as presenting data for use by others requires clear charts. Making the histograms and other charts look good is part of the job.

## Common Graphs and Data Plots

This is again a chapter to review, practice and make sure that everyone is on the same page. I would make the decision ranking the most important representations of data. This is subject to some debate. I deemphasize pie charts and stem and leaf plots. It is true that the general public loves pie charts, and while I find them easy to read, they are of limited use for extended work. Stem and leaf plots are really difficult to read, and in many cases tough to teach. The idea of using different place values to categorize and split the numbers up seems to confuse students, and is tough to read as an end user of the chart. Scatterplots for bivariate data and dot plots are more useful, especially for later units. I make sure my students are comfortable and proficient with making these charts.

## Box and Whisker Plot

Box and whisker plots are great, especially for comparing two or more sets of data in a graphical manner. Students will have the most success with creating these graphs by following a somewhat algorithmic process. The steps outlined in the text is exactly how I work these problems, and would teach and stress the same. The biggest problem that I observe students having is attempting to draw the graph and then add numbers, or not using an even scale for the graph. The best part about the box and whisker plot is the clear representations of center and spread. If an even scale is not used then no quality information can be pulled from the graph. I stress that the number line must be drawn and labeled before any part of the box and whisker graph is drawn. This is even more important when trying to compare two sets of data.

The idea of the middle is an important one. It is worth spending some focused time on, as not only is it useful for finding outliers, but is also an important statistic on its own. One thing students may be unsure of is if the 5 number summary is re-calculated after outliers are “thrown out”. The answer is no, as the summary is resistant to outliers. The changes would not be more descriptive; the graph is the only thing changed for clarity’s sake. For this reason, and some others, I never say “throw out the outliers” as it implies that they aren’t an important part of the data set. Outliers are still important, and have to be treated carefully rather than simply discarded. This is most evident, for example, in analysis of data with an important, but strangely spread data, such as air pollutants by California counties. Los Angeles county is going to be way far out there, but you can’t accurately represent the climate situation in the state without considering Los Angeles. List it, but it is most helpful to show it as a point outside of the whisker, because it’s important to show how far away that single county is, not because it’s not important.