Random walks¶

This demonstration was generated from a Jupyter notebook. You can download the notebook here. Note that not all interactive plotting will work in HTML; you need to run the Jupyter notebook to get that functionality.

# Standard modules for scientific computing and data analysis
import numpy as np
import pandas as pd

# Comment out the datashader imports unless you have installed version 0.6.5 or higher
import datashader as ds
import datashader.bokeh_ext
import datashader.transfer_functions as dstf

# Intractive plotting utilities
import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()

Random walks defined¶

A random walk is defined as a sequence of random variables such that

\begin{align} X_{n+1} = X_n + \epsilon_n, \end{align}

where $\epsilon_n$ is randomly generated independent of $X_n,X_{n-1},\ldots$. In many biological contexts, random walks are used to describe movement in space. For concreteness, that is what we will have in mind presently.

Thinking about a walker in space, this just means that each step is drawn from some probability distribution such that previous steps the walker took have no influence on it.

In this tutorial, we will explore properties of random walks computationally. Importantly, we will visualize what random walks actually look like. The first step in this endeavor is to write a function to generate random walks.

A random walk generator¶

We will consider two-dimensional random walks. The direction that the walker steps each time is uniformly distributed, i.e., no direction is favored over another. The size of a step, though, may be generated from any distribution we wish.

So, our strategy for generating each step is to chose an angle $\theta \in [0, 2\pi)$ which defines a direction, and then draw the length of the step from a prescribed distribution.

The function below accomplishes this, and is our main workhorse for generating random walks. Interestingly, the function is quite simple, but we will see that we can get rich behavior.

As a technical note, we return the walk as a Pandas DataFrame, since we will need this format for some of the interactive plotting later.

def random_walk(n_steps, p=None, args=()):
    """
    Perform 2D random walk with a given step size distribution.
    
    Parameters
    ----------
    n_steps : int
        Number of steps to be taken in random walk
    p : method, default gives all unit steps
        A method that returns draws from a probability distribution.
        The call signature is p(*args, size=n).
    args : tuple, default ()
        Arguments to be passed to `p`.
    columns : list, default ['x', 'y']
        The column labels on the outputted DataFrame

    Returns
    -------
    x : array_like
        x-positions of random walk
    y : array_like
        y-positions of random walk
    """
    
    # Initialize position arrays
    x = np.empty(n_steps)
    y = np.empty(n_steps)
    
    # Start at the origin
    x[0] = 0.0
    y[0] = 0.0
    
    # Generate random angles of steps
    theta = np.random.uniform(low=0.0, high=2.0*np.pi, size=n_steps-1)
    
    # Generate step lengths
    if p is None:
        step_lengths = np.ones(n_steps-1)
    else:
        step_lengths = p(*args, size=n_steps-1)
    
    # Generate steps, putting zero in front to start at origin
    dx = np.concatenate(((0,), step_lengths * np.cos(theta)))
    dy = np.concatenate(((0,), step_lengths * np.sin(theta)))
    
    # Take the walk!
    return np.cumsum(dx), np.cumsum(dy)

Taking and plotting our first walk¶

Now let's take a random walk of 10,000 steps with unit step sizes.

# Take a short random walk
x, y = random_walk(10000)

Now that we have the random walk, we would like to plot it. So as not to skew physical space, we want the plot to have the same scale on each axis. We therefore need to determine the $x$ and $y$ ranges for the plot such that the random walk is more or less centered.

def plot_ranges(x, y, margin=0.02):
    """
    Determine ranges axes for an x-y plot such that axes are
    on equal scale and walk is roughly centered.
    
    Parameters
    ----------
    x : array_like
        x-positions of random walk
    y : array_like
        y-positions of random walk
    margin : float, default 0.02
        Margin to have around plot of walk as fraction of total
        figure size.
        
    Returns
    -------
    x_range : list, shape (2,)
        Min and max x-values
    y_range : list, shape (2,)
        Min and max y-values
    """

    # Get limits of data
    xmin, xmax, ymin, ymax = x.min(), x.max(), y.min(), y.max()
    
    # Determine total length of axes
    total_range = (1 + 2*margin) * max(xmax-xmin, ymax-ymin)
    
    # Compute start of axes
    x0 = (xmax + xmin - total_range) / 2
    y0 = (ymax + ymin - total_range) / 2
    
    return (x0, x0 + total_range), (y0, y0 + total_range)

Now that we have this function, we can write a function that gives us a blank canvas to paint our plot on. We will turn off borders and grids, since we're looking at qualitative features of random walks. I also always set the background to be transparent in case I want to export the plot for use in a presentation.

def random_walk_blank_plot(x_range, y_range, plot_dim=400):
    """
    Generate blank plot for random walk.
    
    Parameters
    ----------
    x_range : list, shape (2,)
        Min and max x-values
    y_range : list, shape (2,)
        Min and max y-values
    plot_dim : int, default 600
        Height and width of plot in pixels

    Returns
    -------
    output : bokeh.plotting.Figure instance
        Figure instance to which glyphs may be added
    """
    # Incremebt width by 80 to account for toolbar
    p = bokeh.plotting.figure(plot_width=plot_dim+80,
                              plot_height=plot_dim,
                              x_range=x_range, 
                              y_range=y_range)   
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_color = None
    
    return p

Now, we just have to add the glyph to make the plot. We will use a gray line. Note that for displaying plots of this type, it is important to use the line_join='bevel' keyword argument in for the line glyph. The default ('mitre') leads to sharp points.

# Set up plot
p = random_walk_blank_plot(*plot_ranges(x, y))

# Populate glyph
p.line(x, y, color='darkgray', line_width=2, line_join='bevel')

# Display the result
bokeh.io.show(p);

Note that Bokeh plots are interactive. You can select tools in from the menu in the upper right corner that allow you to box zoom, wheel zoom, pan, resize, and others.

Variable step sizes give similar qualitative walks¶

We can repeat the above analysis, except with variable step sizes. We will try exponentially and Gaussian distributed steps. Note that we could have step lengths that are negative for a Gaussian distribution of step sizes. This is not an issue because it just results in shifting the angle of the step by $\pi/2$.

# Take a walk with Gaussian-distributed steps
x_gauss, y_gauss = random_walk(10000, p=np.random.normal, args=(1.0, 0.1))

# Take a walk with exponentially-distributed steps
x_exp, y_exp = random_walk(10000, p=np.random.exponential, args=(1/np.sqrt(2),))

Now we can plot all three on the same plot. To get the plot range, we need to consider all points considered by the walks.

# Set up plot
p = random_walk_blank_plot(*plot_ranges(np.concatenate((x, x_gauss, x_exp)), 
                                        np.concatenate((y, y_gauss, y_exp))))

# Populate glyphs
p.line(x, y, color='darkgray', line_width=2, line_join='bevel', legend='unit step')
p.line(x_gauss, y_gauss, color='dodgerblue', line_width=2, line_join='bevel', 
       legend='gaussian')
p.line(x_exp, y_exp, color='tomato', line_width=2, line_join='bevel',
       legend='exponential')

# Display the result
bokeh.io.show(p);

Random walks are scale invariant¶

This means that they have the same fundamental features whatever scale you look at them. This can be seen by considering the moments of the end-to-end displacement vector of a random walk. Let $\mathbf{x}$ by the end-to-end vector of a random walk of $N$ steps and $\mathbf{s}_i$ be the vector describing step $i$ such that

\begin{align} \mathbf{x} = \sum_{i=1}^N \mathbf{s}_i. \end{align}

Then,

\begin{align} \left\langle \mathbf{x} \right\rangle = \left\langle\sum_{i=1}^N \mathbf{s}_i\right\rangle = \sum_{i=1}^N \left\langle\mathbf{s}_i\right\rangle = N \left\langle\mathbf{s}_i \right\rangle, \end{align}

since all $\mathbf{s}_i$'s are indepndent and identically distributed.

We can also compute higher moments. Let's start with the second.

\begin{align} \left\langle \mathbf{x}^2 \right\rangle = \left\langle\left(\sum_{i=1}^N \mathbf{s}_i\right)^2\right\rangle = \left\langle\sum_{i=1}^N \mathbf{s}_i^2 + \sum_{i=1}^N\sum_{\stackrel{j=1}{j\ne i}}^N \mathbf{s}_i\cdot\mathbf{s}_j\right\rangle = \sum_{i=1}^N \left\langle\mathbf{s}_i^2\right\rangle + \sum_{i=1}^N\sum_{\stackrel{j=1}{j\ne i}}^N \left\langle\mathbf{s}_i\cdot\mathbf{s}_j\right\rangle. \end{align}

By definition, the steps of the random walk are uncorrelated; i.e., the walker has no memory of previous steps. This means that $\left\langle\mathbf{s}_i\cdot\mathbf{s}_j\right\rangle = 0$. Thus, we have

\begin{align} \left\langle \mathbf{x}^2 \right\rangle = \sum_{i=1}^N \left\langle\mathbf{s}_i^2\right\rangle = N\left\langle\mathbf{s}_i^2\right\rangle. \end{align}

For the $n$th moment, we have, using the multinomial theorem,

\begin{align} \left\langle \mathbf{x}^n \right\rangle &= \left\langle\left(\sum_{i=1}^N \mathbf{s}_i\right)^n\right\rangle = \left\langle \sum_{i=1}^N \left(\mathbf{s}_i^n + \text{terms containing } \mathbf{s}_i \cdot \mathbf{s}_{j\ne i} \right)\right\rangle = N\left\langle\mathbf{s}_i^n\right>. \end{align}

So, all moments of the probability distribution are linear in $N$, depending only on the moments of an individual step of the random walk.

To see scale invariance computationally, we need to take many many steps, and then zoom in. A million steps suffices, but this is too many to easily render a plot. We will therefore employ Datashader, which enables Google Maps-style viewing of plots. That is, it only renders to the appropriate detail of the level of zoom you have for your graphic.

First, though, we have to take our walk. We'll take 10 million steps.

# Do a long random walk
x, y = random_walk(10000000)

# DataShader requires input as a Pandas DataFrame
df = pd.DataFrame(dict(x=x, y=y))

In order to use Datashader, we just need to create a function that creates an image that is called every time the Bokeh plot is updated.

def create_image(x_range, y_range, w, h):
    cvs = ds.Canvas(x_range=x_range,
                    y_range=y_range,
                    plot_height=int(h), 
                    plot_width=int(w))
    agg = cvs.line(df, 'x', 'y', agg=ds.any())
    return dstf.shade(agg)

Now, we simply make a blank Bokeh plot and then use Datashader to paint the image appropriate for our level of zoom.

p = random_walk_blank_plot(*plot_ranges(x, y))
ds.bokeh_ext.InteractiveImage(p, create_image)