<img src="https://d5nxst8fruw4z.cloudfront.net/atrk.gif?account=iA1Pi1a8Dy00ym" style="display:none" height="1" width="1" alt="" />

# Chapter 2: Visualizations of Data

Difficulty Level: Advanced Created by: CK-12

Chapter Outline

## Part One: Questions

1. Which of the following can be inferred from this histogram?
1. The mode is \begin{align*}1\end{align*}.
2. mean \begin{align*}<\end{align*} median.
3. median \begin{align*}<\end{align*} mean
4. The distribution is skewed left.
5. None of the above can be inferred from this histogram.
2. Sean was given the following relative frequency histogram to read. Unfortunately, the copier cut off the bin with the highest frequency. Which of the following could possibly be the relative frequency of the cut-off bin?
1. \begin{align*}16\end{align*}
2. \begin{align*}24\end{align*}
3. \begin{align*}32\end{align*}
4. \begin{align*}68\end{align*}
3. Tianna was given a graph for a homework question in her statistics class, but she forgot to label the graph or the axes and couldn’t remember if it was a frequency polygon, or an ogive plot. Here is her graph:

Identify which of the two graphs she has and briefly explain why.

In questions 4-7, match the distribution with the choice of the correct real-world situation that best fits the graph.

1. Endy collected and graphed the heights of all the \begin{align*}12^{th}\end{align*} grade students in his high school.
2. Brittany asked each of the students in her statistics class to bring in \begin{align*}20 \;\mathrm{pennies}\end{align*} selected at random from their pocket or bank change. She created a plot of the dates of the pennies.
4. Jeno bought a large box of doughnut holes at the local pastry shop, weighed each of them and then plotted their weights to the nearest tenth of a gram.
1. Which of the following box plots matches the histogram?
2. If a data set is roughly symmetric with no skewing or outliers, which of the following would be an appropriate sketch of the shape of the corresponding ogive plot?
3. Which of the following scatterplots shows a strong, negative association?

1. b
2. c
3. It must be a frequency polygon. At one point in the graph, there is a decreasing line. An ogive plot represents the cumulative data up to that point, so it can never decrease.
4. b
5. d
6. a
7. c
8. a
9. a
10. d

## Part Two: Open-Ended Questions

1. The Burj Dubai will become the world’s tallest building when it is completed. It will be twice the height of the Empire State Building in New York.
Building City Height (ft)
Taipei 101 Tapei 1671
Shanghai World Financial Center Shanghai 1614
Petronas Tower Kuala Lumpur 1483
Sears Tower Chicago 1451
Jin Mao Tower Shanghai 1380
Two International Finance Center Hong Kong 1362
CITIC Plaza Guangzhou 1283
Shun Hing Square Shenzen 1260
Empire State Building New York 1250
Central Plaza Hong Kong 1227
Bank of China Tower Hong Kong 1205
Bank of America Tower New York 1200
Emirates Office Tower Dubai 1163
Tuntex Sky Tower Kaohsiung 1140

The chart lists the \begin{align*}15\end{align*} tallest buildings in the world (as of 12/2007).

(a) Complete the table below and draw an ogive plot of the resulting data.

Class Frequency Relative Frequency Cumulative Frequency Relative Cumulative Frequency

(b) Use your ogive plot to approximate the median height for this data.

(c) Use your ogive plot to approximate the upper and lower quartiles.

(d) Find the \begin{align*}90^{th}\end{align*} percentile for this data (i.e. the height that \begin{align*}90 \%\end{align*} of the data is less than)

1. Recent reports have called attention to an inexplicable collapse of the Chinook Salmon population in western rivers (see http://www.nytimes.com/2008/03/17/science/earth/17salmon.html). The following data tracks the fall salmon population in the Sacramento River from 1971 to 2007.
Year\begin{align*}^*\end{align*} Adults Jacks
1971-1975 \begin{align*}164,947\end{align*} \begin{align*}37,409\end{align*}
1976-1980 \begin{align*}154,059\end{align*} \begin{align*}29,117\end{align*}
1981-1985 \begin{align*}169,034\end{align*} \begin{align*}45,464\end{align*}
1986-1990 \begin{align*}182,815\end{align*} \begin{align*}35,021\end{align*}
1991-1995 \begin{align*}158,485\end{align*} \begin{align*}28,639\end{align*}
1996 \begin{align*}299,590\end{align*} \begin{align*}40,078\end{align*}
1997 \begin{align*}342,876\end{align*} \begin{align*}38,352\end{align*}
1998 \begin{align*}238,059\end{align*} \begin{align*}31,701\end{align*}
1998 \begin{align*}395,942\end{align*} \begin{align*}37,567\end{align*}
1999 \begin{align*}416,789\end{align*} \begin{align*}21,994\end{align*}
2000 \begin{align*}546,056\end{align*} \begin{align*}33,439\end{align*}
2001 \begin{align*}775,499\end{align*} \begin{align*}46,526\end{align*}
2002 \begin{align*}521,636\end{align*} \begin{align*}29,806\end{align*}
2003 \begin{align*}283,554\end{align*} \begin{align*}67,660\end{align*}
2004 \begin{align*}394,007\end{align*} \begin{align*}18,115\end{align*}
2005 \begin{align*}267,908\end{align*} \begin{align*}8,048\end{align*}
2006 \begin{align*}87,966\end{align*} \begin{align*}1,897\end{align*}

Figure: Total Fall Salmon Escapement in the Sacramento River. source: http://www.pcouncil.org/newsreleases/Sacto_adult_and_jack_escapement_thru%202007.pdf

• During the years from 1971 to 1995, only 5-year averages are available.

In case you are not up on your salmon facts there are two terms in this chart that may be unfamiliar. Fish escapement refers to the number of fish who “escape” the hazards of the open ocean and return to their freshwater streams and rivers to spawn. A “Jack” salmon is a fish that returns to spawn before reaching full adulthood.

(a) Create one line graph that shows both the adult and jack populations for those years. The data from 1971 to 1995 represents the five-year averages. Devise an appropriate method for displaying this on your line plot while maintaining consistency.

(b) Write at least two complete sentences that explain what this graph tells you about the change in the salmon population over time.

1. The following data set about Galapagos land area was used in the first chapter.
Island Approximate Area \begin{align*}(\mathrm{sq}.\mathrm{km})\end{align*}
Baltra \begin{align*}8\end{align*}
Darwin \begin{align*}1.1\end{align*}
Española \begin{align*}60\end{align*}
Fernandina \begin{align*}642\end{align*}
Floreana \begin{align*}173\end{align*}
Genovesa \begin{align*}14\end{align*}
Isabela \begin{align*}4640\end{align*}
Marchena \begin{align*}130\end{align*}
North Seymour \begin{align*}1.9\end{align*}
Pinta \begin{align*}60\end{align*}
Pinzón \begin{align*}18\end{align*}
Rabida \begin{align*}4.9\end{align*}
San Cristóbal \begin{align*}558\end{align*}
Santa Cruz \begin{align*}986\end{align*}
Santa Fe \begin{align*}24\end{align*}
Santiago \begin{align*}585\end{align*}
South Plaza \begin{align*}0.13\end{align*}
Wolf \begin{align*}1.3\end{align*}

Figure: Land Area of Major Islands in the Galapagos Archipelago. Source: http://en.wikipedia.org/wiki/Gal%C3%A1pagos_Islands

(a) Choose two methods for representing this data, one categorical, and one numerical, and draw the plot using your chosen method.

(b) Write a few sentences commenting on the shape, spread, and center of the distribution in the context of the original data. You may use summary statistics to back up your statements.

1. Investigation: The National Weather Service maintains a vast array of data on a variety of topics. Go to: http://lwf.ncdc.noaa.gov/oa/climate/online/ccd/snowfall.html. You will find records for the mean snowfall for various cities across the US.
1. Create a back-to-back stem-and-leaf plot for all the cities located in each of two geographic regions. (Use the simplistic breakdown found at the following page http://library.thinkquest.org/4552/ to classify the states by region).
2. Write a few sentences that compare the two distributions, commenting on the shape, spread, and center in the context of the original data. You may use summary statistics to back up your statements.

1. (a)
Class Frequency Relative Frequency\begin{align*}(\%)\end{align*} Cumulative Frequency Relative Cumulative Frequency\begin{align*}(\%)\end{align*}
[1100-1150) \begin{align*}1\end{align*} \begin{align*}7.1\end{align*} \begin{align*}1\end{align*}
[1150-1200) \begin{align*}1\end{align*} \begin{align*}7.1\end{align*} \begin{align*}2\end{align*} \begin{align*}7.1\end{align*}
[1200-1250) \begin{align*}3\end{align*} \begin{align*}21.4\end{align*} \begin{align*}5\end{align*} \begin{align*}14.3\end{align*}
[1250-1300) \begin{align*}3\end{align*} \begin{align*}21.4\end{align*} \begin{align*}8\end{align*} \begin{align*}35.7\end{align*}
[1300-1350) \begin{align*}8\end{align*} \begin{align*}57.1\end{align*}
[1350-1400) \begin{align*}2\end{align*} \begin{align*}14.3\end{align*} \begin{align*}10\end{align*} \begin{align*}57.1\end{align*}
[1400-1450) \begin{align*}10\end{align*} \begin{align*}71.4\end{align*}
[1450-1500) \begin{align*}2\end{align*} \begin{align*}14.3\end{align*} \begin{align*}12\end{align*} \begin{align*}85.7\end{align*}
[1500-1550) \begin{align*}12\end{align*} \begin{align*}85.7\end{align*}
[1550-1600) \begin{align*}12\end{align*} \begin{align*}85.7\end{align*}
[1600-1650) \begin{align*}1\end{align*} \begin{align*}7.1\end{align*} \begin{align*}13\end{align*} \begin{align*}92.9\end{align*}
[1650-1700) \begin{align*}1\end{align*} \begin{align*}7.1\end{align*} \begin{align*}14\end{align*} \begin{align*}100\end{align*}

(b)

(c)

(d) approximately \begin{align*}1625 \;\mathrm{ft}\end{align*}

1. There isn’t necessarily a wrong way or right way to create this graph and to interpret the different time intervals, but a year should be the same distance apart for the entire graph so that the rate of change of the lines means the same thing across the entire plot. In this case, we plotted the average as a point in the middle of the five-year interval. It is possible that a student could devise a better representation, as long as the relationship in the data is clearly and correctly represented.
2. Answers will vary, but comments should focus on features of the plot that are placed in the context of the actual situation. For example, the plot of adult salmon increases dramatically after 1995 to a peak in 2002. This could be due to many factors, one of which was the inclusion of the Chinook salmon under the endangered species act. The plot for the Jack salmon stays relatively horizontal, indicating that the Jack population remained relative constant until the most recent downturn. Other comments could be made and interested students might be encouraged to research things such as climate conditions or changes in the management of the salmon populations that may have led to the increases or decreases.
1. (a) The various plots are shown below: The only plot that does not seem to be a good fit is a stem-and-leaf plot. There is an extremely wide spread with the outlier, and creating meaningful stems would be difficult. (b) The plot is spread very widely, extending from a group of islands with almost no significant area, to the largest island, Isabela, which is so large at \begin{align*}4600 \;\mathrm{mi}2\end{align*} that it is an extreme outlier. Even without the outlier, there is still a significant variation in the remaining islands. Ignoring Isabela, the distribution is still significantly skewed right. You can see this in all three graphs and it shows that most of the islands in the archipelago are smaller. The box plot does not appear to have a left whisker, but it is in fact, so small in relation to the scale of the graph, that it is indistinguishable. Here is a box-and-whisker plot without the outlier that has been rescaled. The center would most appropriately be measured by the median because the extreme skewing and outliers will raise the mean substantially. The median island size is approximately \begin{align*}42\;\mathrm{square}\end{align*} kilometers.
2. .

Show Hide Details
Description
Difficulty Level:
Authors:
Tags:
Subjects: