Although random walks can be challenging to understand using math alone, they can be easily simulated on computers. Random walk simulations can be written in practically any programing language. In this chapter, I show how to write all the programs I mention in Python, a free, open source programming language used by scientists throughout the world. It can be downloaded at http://www.python.org/download/ (this chapter is written for the 2.xx version of Python, which should be supported into the mid 2010's). I also use the Pylab package, which adds extra functionality to Python (and is also freely available); it us used by scientists throughout the world for mathematical computing. It's available at http://www.scipy.org/PyLab.
Words like 'programing', 'computationally', and 'simulations' make this subject seem much more difficult than it is. In reality, it requires very little programing knowledge. The following section goes over the basics we need. All you need to start is a computer with the above software installed.
You are probably familiar with computer programs, like browsers, mp3 players, and games. On a very fundamental level, these can be described as tools that let you access the processing power of a computer in specific ways. Computer programing languages allow one to create her own programs and have more direct control over how the processing power is used. Writing complex programs like mp3 players is a pretty involved task, but we only need basics for random walk simulations.
Good (very brief) outlines for beginning Python can be found at http://www.ibm.com/developerworks/library/l-cheatsheet3.html and at http://docs.python.org/tutorial/introduction.html. More about Pylab, which adds a lot of computational functionality to Python, can be found at http://www.scipy.org/Cookbook.
What follows is not a general introduction to programing, but an outline --- even briefer than those mentioned above --- of the concepts and methods needed to run simulations that illustrate the results from the first part of the chapter in the Python programing language. Because of Python's interactive nature, the simulations we run should be relatively familiar to anyone who has used a graphing calculator for a math class, though that is certainly not a requirement.
I hope to provide enough of an introduction that a motivated student can figure the rest out through interactive exploration. That said, interested readers can resolve their problems, find more information on Python, and learn more about computational programing in general by following the links above.
Object, Variables, and Functions
Python, the language I use to write the simulations below, is an 'object oriented' language. Loosely interpreted, this means that it can be understood in terms of two categories: objects (things that store data) and functions (ways to create and modify data).
In everyday use, the word 'objects' can be used like interchangeably with 'stuff', or 'things' in reference to anything that exists, has a name, takes up space, or contains information. This isn't far from the way programing languages interpret 'objects' as well. In python objects can be things like numbers, lists of numbers, pictures, files, and other entities that store data.
Going along with the real world analogy, let's think of something that is typically referred to as an 'object', like a table. When we speak of a particular table, we imply that it is an object that belongs to the category of things that can be called 'tables'. In the same way, objects in programing languages are concrete instances of broader categories called 'types'. In Python, things that belong to the type 'integers', for instance, are numbers like '2', '-4', '5', etc. We will need several types of data structures to store the results of our simulations later.
A concrete object (instance of a type) can have a name, or reference; no two objects in use at the same time can have the same name (otherwise the computer would not be able to tell them apart). We can create new references to objects using variables --- more on this later.
If objects' main purpose is to store data, functions allow us to create and manipulate this data. Functions generally take one or more objects or parameters as input, perform some procedure on or with them, and return one or more objects as output. For instance, in Python, the 'square root' function can take a number as an input, take its square root, and return this number as an output. We will need several different functions for our random walk simulation.
Python has some built in types of objects and some built in functions, such as those mentioned above. Additionally, like all object oriented language, it provides the framework for creating new, user defined, types and functions. These can, in turn, use the pre-existing ones. There are also many libraries of functions and types --- made by people throughout the world --- useful for specific purposes freely available on the internet.
Here's a summary of the important terms:
- Objects: Entities (of many different types) that contains data.
- Variables: Strings or characters that reference, or name, objects.
- Functions: Algorithms for manipulating data.
The paragraphs above were pretty abstract; now we translate that theory into actual programing. Before we can run any simulations, we need to understand how to interact with the computer. Luckily, this is easy in Python due to its interactive interface.
One of the nice features about Python is that it can be run interactively using a 'command line': in other words, you can see results in real time rather than having to type all your programs in a separate text file that needs to be translated for the computer to run it. Most computer languages require this translation, called 'compiling'. In that situation, running a simulation is difficult: you have to write a complete set of instructions before starting, since once the program is compiled it generally can't be altered. These sets of instructions are called 'computer programs'. Due to Python's interactive compiler, we can just perform the steps one by one ourselves.
At this point, let's start the Python interface and explore some of the functionality that we will need. Note: in the screenshots, the lines starting with 'In [#]:' are input (typed in), while those with 'Out [#]' are output returned by the computer.
Creating and Manipulating Objects
First, we type 'from pylab import *' and 'from random import *' to gain access to the libraries we will be using for this project.
The equal sign ('=') assigns variables values in Python. When I type 'a = 5', I am creating a number object which stores the value '5' and referencing it with the letter 'a'. We assign variables to objects we are using in order to keep track of them and access them more easily (each object has a unique identity but can have many variables refer to it, like 'a','b', and 'c' do below). If I input a variable name (a quick warning: variables are case sensitive, and can include more than one letter ant number), Python prints a description of what it references (not all are just one number, some are much more complicated sets of data):
A useful data type that we will need for our simulations is the 'list.' Like their name suggests, list objects can store multiple entries of data. Lists can be created using the '[a,b,c,…]' construction. To access the nth element of list b, we us the construction 'b[n]':
Lists can be nested:
When accessing lists above, what we usually call the 'first' element is indexed with a '0', and this is important to keep in mind. Finally, a note on referencing: certain characters and strings, such as all numbers, are related to pre-defined functions or types and cannot be used as variables (see error message below).
Many objects in Python can be altered after they are created. For instance, one can change the elements of a list object one by one:
A slightly obscure point about the last example is that certain objects can't be altered after they are created, such as numbers. But what does this mean practically? Simply that if you use the same variable for two different numbers, two different number objects are created in succession instead of one object being changed; this distinction won't have many implications for us.
Functions, broadly speaking, offer various ways to process or modify data. The general method of calling functions is 'FunctionName(Inputs)'. For instance, the square root 'sqrt()' function takes a number and returns its square root.
The 'zeros()' function and the 'range()' functions both take a number as an argument and return lists. The first returns an array (type of list) of zeros of a given length, while the second returns a list of numbers that starts with zero and ends one below the given number. We will use these functions to help us create structures that will store the data in our simulation.
The choice() function takes a list as an input and returns an element, chosen at random. We will use this to pick steps randomly from the list [-1,1].
Arithmetic, Operators, and Loops
The �Python interactive mode can be used much like a calculator --- it understands most arithmetic symbols in their common meanings. These symbols, along with a few others, are known as 'operators' and are somewhat like streamlined functions. The ones we will use are fairly self-explanatory:
A quick note on Python arithmetic: Python has several types devoted to numbers. When 'arithmetical' symbols and functions are used, results are often determined by the types of the arguments. For instance, when one integer is divided by another, an integer is returned: not necessarily the actual result. Converting integers to floating point numbers using the float() operation solves this issue, but this problem does not affect the simulations below anyway.
Finally, we will need a construction called a 'for loop'. It is a structure that allows us to perform some action for every element of a list. So, to use it, we first need a list to iterate over. The loop is then defined in terms of a variable name. Below, we use 'x'. This variable is only defined for the duration of the loop, and takes value equal to the elements of the list. The loop basically does the following:
- Sets the variable equal to the first element of the list.
- Performs some action
- Sets the variable equal to the second element of the list
- Performs some action again
The print function, as you may have guessed, finds the value of an expression and then writes it on the screen.
Modeling Random Walks
We now have all the elements we need to write a random walk simulation. We need to combine them in a way that will exactly mimic the random walker described in the first section of the chapter. Let's consider how this can be done, and what we would need:
- A list of steps to iterate over. If we want to model an N-step random walk, we will use a list given by range(N), or [0,1,2,…,N-1], which has N entries.
- At any step, we can use the choice([1,-1]) function to pick a step at random. By setting the step lengths equal to one (as we did in the first part of the chapter), we allow '-1' to represent a step left and '1' --- a step right.
- A list object, initially all zeros (created by the zeros() function), can be used to store data about positions. At any step, we will modify the corresponding element of the list to hold the location at that step. In other words, a list for a three-step walk could be (0,1,2,1) --- which would correspond to a step right, another step right, and then a step left. If we plan on modeling an N step random walk, this list should be of length N+1, since the first location is 0 by default (starting location).
According to the outline above, our simulation for a random walk of N steps has to start by defining the two lists described above. For this example, N = 10:
Now we have to write our for loop as follows:
Let's consider what this loop does over its first two iterations:
- Set x equal to the first element of the Steps list, specifically, x = 0.
- Sets Positions[0+1] --- the second (why?) element of that list --- equal to ((either 1 or -1) + Positions). Positions is equal to 0, though, because that is the starting location).
- Sets x equal to the second element of the Steps list, x = 1.
- Sets Positions[1+1] --- the third element of that list --- by adding 1 or -1 (picked randomly) to the previous location, Positions.
- So on…
Accordingly, after running the loop, the kth element of the Positions list should correspond to the location of the walker after k steps. Indeed:
This case corresponds to steps right, right, left, right, left, left, right, right, right, right. Of course, others could have been possible, as we can see by running the loop again (doing this resets the simulation --- why?).
Visualizing the Random Walk
Now that we have an array of locations, we can create a graph of the random walk using Python's built in plot() function. Here's more information on plotting: http://www.scipy.org/Plotting_Tutorial.
On this graph, the x-axis will correspond to the number of steps, whereas the y-axis will be the location of the walker (in other words, what we called 'right' before will be 'up', left --- 'down'). The Positions list from above provides our y-values. At first it might seem like the Steps list could be used for x-values, but remember: it is one shorter than the positions list (there is a starting position, but no starting step). So, if there are N+1 positions, we have to use range(N+1) for a set of x-values. The following graph is produced when we input plot(range(N+1), Positions):
This image corresponds to the Positions list given above (why?). By default, when Python plots a two sets of points, it will connect them with lines. To overcome this (if we want) we can use this construction: plot(range(N+1), Positions, 'o'). This will use dots ('o's) instead of lines:
We can add more lines to a plot once it is created. Python will do this automatically (and using different colors). Below, I run the random walk simulation twice more and plot them one by one:
Histograms and Probability Distributions
In the section on the theory behind random walks, we were interested in finding the probability distribution of end locations of a random walk, not just visualizing it. We can never find the exact probability distribution using simulations, but if we run them enough times, we should be able to approximate it with a histogram of end locations. Histograms take a list of values and bin them according to their magnitude. They then graph the bins on the x-axis and the number in each bin on the y-axis.
To plot a histogram, we need to have a list of end locations; we don't care about the intermediate values. There are several ways to obtain such a list (problem 3), but the most basic is:
- Start with a list of zeros.
- Run the random walk simulation, but only save the last location.
- Store it as an element of the list.
- Repeat many times.
For instance, let's say we have a list with 1000 elements called 'data' which contains 1000 end locations from random walks with 10 steps. We can create a histogram of the values using the function hist(data,20). The 20 refers to the number of bins that the computer will use. Using twice the number of steps for the number of bins makes sense because it scales the x-axis in the same way our probability distributions did in the first part. Here's the aforementioned graph (compare to the 10-step case as it was derived earlier):
- What sequences of steps lead to the following 'Positions' lists:
- [0, 1, 0, -1, -2, -3, -2, -1, 0, -1, 0]
- [0, -1, -2, -1, -2, -1, -2, -3, -2, -1, 0]
- [0, 1, 2, 1, 0, -1, -2, -1, 0, 1, 2]
- Look at the wikipedia page on random walks. Now, look at the figure included: http://en.wikipedia.org/wiki/File:Random_Walk_example.svg. This figure was generated in Python. Explain how the program that creates it works, and run it yourself.
- Find a way to obtain lists of end locations needed to make histograms. You will have to use nested for loops in one of three ways (in each case, make sure to store the end location at the end of each run in some list):
- Alter the program used on the wikipedia page.
- Repeat the for loop we used to generate Positions lists several times.
- Finally, the quickest way is probably to find a way of simulating a random walk that only results in the end location, then run that simulation several times.
- One way to model biased random walks is to increase the number of arguments in the choice() function we use. Think about how this relates to altering the parameter P in random walks from the first part of the chapter. Is there a one to one correspondence, or is one method more comprehensive?
- Find equivalent ways to represent several biased random walks using the two methods above. Then, use the method from the first part of the chapter to graph the probability mass function, and the method from the second half to create a histogram. Do they look alike? How do histograms relate to probability mass functions?