Introduction to Non-Parametric Statistics
Non-parametric statistics are a more exotic test. The situations in which data is encountered that is not normal are few, and as a consequence this section is very rarely covered in any statistics course, let alone a high school level AP course. These sections are advisable only for students who are truly exceptional. Furthermore, these tests are almost never run by hand. Every statistics specific package will have tests, and they should be used.
An interesting note is that a common use of the runs test is to test for randomness. This makes sense, as testing for correlation is the opposite, and obviously random numbers will not be randomly distributed. Appendix A: Online Data Sources
There are a number of sources of real data that is pertinent and interesting for projects and assignments for students. Here is a short list of sources I have used.
The central clearinghouse for raw data for all of the federal government agencies. Data is available in XML or CSV files.
US Census: http://www2.census.gov/census_2000/datasets/
Raw data from the 2000 census. There are other tools on the census site, but most of them manipulate the data too much rather than just presenting numbers.
UN Data: http://data.un.org/
A clearinghouse for the data collected from various UN organizations including the WHO, WTO, UNICEF and UNESCO.
Baseball Reference: http://www.baseball-reference.com/
While there are other sports that collect statistics, none do it with the verve of baseball. This is the most complete resource for all kinds of baseball statistics.
Appendix B: Statistical Software Packages
This list is not exhaustive, but should give a brief overview as to what software is available to support classroom activities.
Fathom is published by a textbook publisher, so it has a wide following in schools. It is easy to use, powerful enough for most, and has many activities and lesson plans available for use. It also is one of the cheaper packages. Windows and MacOS
SPSS has outgrown its acronym: Statistical Package for the Social Sciences. It is now a huge data mining and statistics suite owned by IBM. It is easy to get started with, as it uses a spreadsheet type interface for most of its data. Expensive. All platforms.
SAS is geared more towards enterprise uses. I have never been in a school lab with SAS installed, but it remains a popular choice for many, with lots of online community help and resources. Expensive. All platforms.
R is a FREE open-source clone of the functionality from the legendary math and statistics package S and S+ from AT&T. It is huge, the documentation is hard to read, it has few graphical interface items, and is truthfully more of a programming language/environment. It’s what I use. It does everything. Did I mention its free? All platforms.
Minitab is, in my experience, somewhere between SAS and SPSS in usability. It is a favorite of many business schools. There is a significant group of AP teachers using Minitab in their classes, so lesson help should be easy to find. Expensive. Windows only.
An all around math software and programming environment. It’s using a cannon to kill a fly for a stats class. Tough to use, but has incredible documentation. Very, very expensive, but massive discounts are available to teachers and students. All platforms.
You were waiting for this one... Well, the good news is you probably have it already, and everyone is familiar with at least the basics. The bad part is that the range of tests is very limited, and there are an incredible amount of documented errors with various distributions and functions. I can’t recommend it, but it has worked for some in the past, and will continue to work for many in the future. You probably don’t need to buy it. Windows and MacOS.
Not software, but a website that is committed to provide about as good of random numbers as is possible. There is also a ton of info on randomness, why it’s elusive and some other techniques for better generating randomness. A great resource for teachers and students. Free!