# 11.2: Theory

**At Grade**Created by: CK-12

## Two Sides of the Same Coin

Since this chapter is about statistical physics, it's good to start by considering how this field differs from traditional statistics.

In statistics, scientists use empirical data to estimate some unknown number or set of numbers (called parameters). Let's say we have a coin and we don't know what the probability of it landing on heads after during a flip is, but we do know that this probability remains constant between flips. A question that a statistician might ask is, 'How can we best estimate, based on repeated trials, this probability --- our parameter for this particular situation?'

Imagine we flipped this coin 1,000 times and found that it landed on heads on 715 of those. In this case, our best estimate for the probability --- let's call it \begin{align*} P \end{align*}

Broadly, the problem facing statisticians can be summarized as, 'How can we translate collected data (like the results of the 1000 flips above) into an estimate of the unknown parameter (\begin{align*} P \end{align*}

As simple as the coin example is, our treatment of it shows two major purposes of statistics: to find ways of estimating parameters from some data and to study the efficiency and accuracy of such methods.

Statistical physics (part of it, at least), on the other hand, uses the coin as the basis of a model --- the random walk. We are no longer interested in estimating parameters, but in trying to model some more complicated situation by reducing it to a series of coin flips. Like the example above for statistics, the random walk model described below gives a simple glimpse into the methods and results of statistical physics.

## Introduction to One-Dimensional Random Walks

Imagine that each time the coin is flipped, the person flipping it takes a step --- to the right if it lands on heads, and to the left if it lands on tails. This kind of behavior is called a random walk. It doesn't matter if there is actually a person flipping a coin for the results below; many processes in the natural and human worlds can be modeled as aggregates of individual steps whose directions are determined randomly: stock prices, diffusion, and gambling winnings or losses are just a few applications among many.

If the steps are taken along the same line --- as described above --- the random walk is called one-dimensional; two and three-dimensional random walks are more difficult to describe mathematically, so we'll focus on the one-dimensional case in this chapter. For similar reasons, we will only look at random walks with constant step sizes (i.e. one unit to the left or right, rather than a varying amount). Interestingly enough, a lot of the relevant behavior found for this case can generalize to more complicated random walks in higher dimensions and with varying step sizes.

Statistical physicists often hope to use results from simple, idealized models, to gain intuition about more complicated systems. In this case, the relevant question is, 'During a random walk, how likely is a walker to be in a given location after some given number of steps?'

## Mathematics of One-Dimensional Random Walks

Let's analyze this in more mathematical detail --- we would like to find is the probability that after \begin{align*} N \end{align*}

### Three-Step Case

In physics derivations, it's often possible to obtain an intuition about the right way to find a general result or formula by considering simple specific cases first. We will use this method here --- let's look at the possible outcomes of a three step random walk where \begin{align*} P = \frac{1}{2} \end{align*}

The first case is equivalent to flipping three heads in a row and therefore taking three right step, the second to flipping the sequence heads, tails, heads, and so on. Since in this case the coin is fair, the eight outcomes (step combinations) shown above are equally likely to occur: they each have a probability of \begin{align*} {\frac{1}{2}}^3 = \frac{1}{8} \end{align*}

These outcomes, however, do not all result in different end locations (the four low arrows) for the walker: this is determined by the difference between the number of steps taken to the right and the number taken to the left. So while only one outcome corresponds to an end location of three steps to the right or three to the left, three outcomes correspond to an end location of one step to the right or one to the left. So the eight equally likely outcomes result in four possible end locations that are clearly not equally likely.

Since the outcomes are mutually exclusive (you can't have more than one of them occur at the same time), the probability that a particular end location (such as one step to the right of the starting point) occurs will equal to the sum of the probabilities of the outcomes that lead to it. Since all eight outcomes are equally likely, this will also equal the number of relevant outcomes multiplied by the probability of a single one. Therefore, the probability of ending one step right is:

\begin{align*}
\frac{1}{8} + \frac{1}{8} + \frac{1}{8} = 3 \times \frac{1}{8} = \frac{3}{8}
\end{align*}

This reasoning allows us to find the probabilities of the other possible end locations as well, noted below the arrows on the graph above. This grouping of possible end locations and their respective likelihoods is an example of a probability mass function, which in general is a list of events with associated probabilities. Therefore, the upper list of outcomes that happen with probability \begin{align*} \frac{1}{8} \end{align*}

In more mathematical language, we can represent of a set of events as \begin{align*} ({x_1,x_2,x_3,\dots,x_i}) \end{align*}

Above, we have answered the question posed at the beginning of this section for the three-step case. That is, we have completely determined what the likelihood of the walker being in any possible location is at the end of this walk.

### General Case

Now let us try to generalize these results to to a random walk with \begin{align*} P = \frac{1}{2} \end{align*} and \begin{align*} N \end{align*} steps. The intuition we obtained from considering the simple case above can be summarized as follows: to find the probability mass function of the end location of a random walker, one should first consider all the possible outcomes, find the ones that lead to the same end locations, and multiply their number by the probability of a single outcome.

The diagram below is analogous to the one for three steps, but now with \begin{align*} N \end{align*} steps. We can divide the possibilities into \begin{align*} 2^N\end{align*}equally likely outcomes, this time each with probability \begin{align*} \frac{1}{2^N} \end{align*}. The question is, 'How many outcomes lead to a given end location, say, \begin{align*} D \end{align*} steps to the right (as posed above)?'

There is still only one outcome that leads to each of the two 'extreme' locations, when all steps are taken either right or left. Therefore, their probabilities are \begin{align*} 2^{-N} \end{align*} --- but what about the other locations? To find their respective probabilities, we first use the fact that end locations depend on the difference between the number of steps taken to the left and right (and not their order) to pose the problem in a slightly different way.

Let \begin{align*} L \end{align*} be the number of steps taken to the left, and \begin{align*} R \end{align*} to the right. Since the total number of steps is \begin{align*} N \end{align*},

\begin{align*} N = L + R && \text{Total steps} \end{align*}

If the walkers winds up \begin{align*} D \end{align*} to the right of the origin, she must have taken \begin{align*} D \end{align*} more steps to the right than to the left:

\begin{align*} D = R - L && \text{Total distance traveled} \end{align*}

Solving these two equations, we find that:

\begin{align*} R = \frac{1}{2} (D+N) && \text{Steps to the right}\\ L = N - R = \frac{1}{2}(N-D) && \text{Steps to the left} \end{align*}

We have solved for the necessary number of steps left and right in terms of known quantities, \begin{align*} N \end{align*} and \begin{align*} D \end{align*}. At this point all that remains is finding how many ways there are to take \begin{align*} \frac{1}{2} (D+N) \end{align*} steps to the right out of a total of \begin{align*} N \end{align*} steps: this will give us the number of outcomes that lead to end location of \begin{align*} D \end{align*} steps to the right.

In the three step case, for instance, ending one space right of the origin required taking two steps right and one step left; there are three discrete ways to take two achieve this (the left step can be first, second, or third), and so three outcomes that lead to that location.

For the case of \begin{align*} N \end{align*} total steps and \begin{align*} \frac{1}{2} (D+N) \end{align*} steps to the right, the correct result will be given by the 'ways of choosing' formula from combinatorics: the number of ways to choose \begin{align*} \frac{1}{2} (D+N) \end{align*} positions for the right steps out of a total of \begin{align*} N \end{align*} positions. This is written as

\begin{align*} \binom{N}{\frac{1}{2}(N+D)} && \text{Number of outcomes} \intertext{where} \binom{n}{k} = \frac{n!}{k!(n-k)!} \end{align*}

Since each outcome has probability \begin{align*} 2^{-N} \end{align*}, the probability of finding the walker a distance \begin{align*} D \end{align*} steps to the right of the origin is given by the following formula:

\begin{align*} P(D) = 2^{-N}\times \binom{N}{\frac{1}{2}(N+D)} && \text{Probability of being D steps away after N steps} \end{align*}

We have now answered our original question (finding the probabilities of various end locations) for all unbiased (\begin{align*} P = \frac{1}{2}) \end{align*}) random walks with constant step lengths. Again, we can plot this distribution for several different cases:

When the number of steps becomes large, the distribution begins to look like a bell curve; here is the plot for \begin{align*} N = 100 \end{align*}:

## Problems

- What is the difference, in terms of end probability distributions, between random walks with even and odd numbers of steps)
- Why are we justified in setting the step lengths equal to 1 in modeling
*all*constant step length random walks? - Solve for the probability mass function of end locations for a four-step random walk analogously to the three-step example above (illustrating it also). Then, graph this probability mass function.
- Our proof for the general case can be called 'right-biased' in two ways. This question settles both:
- We found the probability of being \begin{align*} D \end{align*} to the right of the origin, but the probability distributions were graphed as symmetrical. First, explain why this must be true in terms of possible outcomes and end locations. Then, show that the formula for \begin{align*} P(D) \end{align*} can be used to find probabilities to the left also, that is, prove that \begin{align*} P(D) = P(-D) \end{align*} using the formula above and the definition of factorials.
- We also found the number of outcomes that lead to an end displacement of \begin{align*} D \end{align*} in terms of steps taken to the right. Use the result from the previous part to show that using \begin{align*} \frac{1}{2}(N-D) \end{align*} --- the number of steps to the left corresponding to a final distance of \begin{align*} D \end{align*} steps to the right --- in the derivation of the general result would not have changed it.

- Derive the probability mass function for a biased random walk (that is, steps in one direction are more likely than in the other, or the coin has a higher probability of landing heads than tails). Hint: the outcomes will no longer be equally likely, but what about outcomes that lead to specific end locations?
- Graph a few of these distributions.