MCMC and Stan

Introduction

Welcome back!

Plan for today:

MCMC
Getting started with Stan in Python

MCMC

The big picture

In A rule for evaluating the target function and maybe its gradients

Out: A Markov Chain of numbers that you can do Monte Carlo integration with.

Simple case

Problem

One random variable \(X\) with probability density function \(density\).

Aka a “one-dimensional parameter space”.

Evaluating \(density(x)\) for a given value \(x\) (aka “point in parameter space”) is easy.

Calculating the area under a region of the \(density\) curve (aka “the probability mass”) is expensive.

This is annoying, we need to know that!

Solution

Generate a series \(x_1, ..., x_i,..., x_n\) where every number depends on the previous number(s), i.e. a Markov chain.

To calculate \(x_{i+1}\), generate a random number and take into account \(density(x_i)\). ¹

¹ This is the interesting and tricky bit!

If this works, with a long enough series of numbers we get something like this:

Success condition for MCMC

The numbers from the Markov chain have to approximately agree with the target density function, i.e. in any region the number of dots is approximately proportional to the area under the curve.

Now we can do Monte Carlo integration, i.e. approximate the area under a region of curve by counting the red dots in that region.

Metropolis Hastings

The first (I think?) MCMC algorithm. Original paper: Metropolis et al. (1953).²

² Metropolis was first author but didn’t do any work! That was Arianna Rosenbluth (programming) plus Marshall Rosenbluth & Edward Teller (maths)

Generates Markov chains that provably agree with arbitrary target density functions (in the asymptotic limit).

Roughly how it works:

Choose candidate by randomly perturbing previous point \(x_i\)
Accept or reject candidate randomly according to the ratio \(\frac{density(candidate)}{density(x_i)}\)
\(x_{i+1}\) is candidate if accept else x_i

Visualisation

Doesn’t work for more than ~10 dimensional parameter spaces.

Hamiltonian Monte Carlo

Big picture: MCMC that works for large parameter spaces.

Key innovation: travel through parameter space quickly using gradients.

Illustration:

A better illustration

A small but important detail:

To decide how hard to flick the ball and how precisely to calculate its trajectory for a particular case, adaptation is required, i.e. running the algorithm in warm-up mode for a bit and learning by trial and error. How best to do adaptation is an important open question.

Limitations:

No discrete parameters
Performs badly when the target (log-scale) density function is wiggly.

Reading

Betancourt (2018b)

Betancourt (2018a)

Beskos et al. (2010)

Andrieu and Andrieu (2003)

Stan

Stan is:

A language for specifying probability density functions as Stan programs.
A compiler that turns Stan programs into instructions for inference engines.
An inference engine implementing adaptive HMC and some other algorithms.
A library of functions for calculating the gradients of interesting probability density functions.
Some interfaces for popular computer tools:
- Command line: cmdstan
- Python:
  - cmdstanpy
  - pystan
- R:
  - cmdstanr
  - Rstan

Why use Stan?

Alternatives: pymc, blackjax, Turing.jl tensorflow probability

Overview as of 2023: Štrumbelj et al. (2023).

Why I like Stan:

Big, active and knowledgeable community (most important reason)
Featureful (complex numbers, fast solvers, up-to-date diagnostics)
Fast (for CPU-bound, general purpose adaptive HMC)

Getting started with Stan in Python

Install cmdstanpy

pip install cmdstanpy

Use cmdstanpy to install the rest of Stan

python -m cmdstanpy.install_cmdstan --cores 2

I like to store Stan outputs using the library arviz. It also makes nice plots.

pip install arviz

Write a Stan program

A Stan program consists of function definitions, variable declarations and statements, organised into {...} delimited blocks, e.g.

data {
  real y;  # a variable declaration
}
model {
  y ~ normal(0, 1.4);  # a statement
}

The purpose of a Stan program is to define the probability density for any combination of data and parameters.

It is ok for there to be no parameters:

transformed data {
  real y = 2;  # this is both a statement and a declaration!
}
model {
  y ~ normal(0, 1.4);  # the total density is N(2 | 0, 1.4) = 0.103
}

or no data:

parameters {
  real alpha;
}
model {
  alpha ~ normal(0, 1.4);  # Stan can find the density for any alpha
}

Cmdstanpy workflow

Step 1

Use standard Python tools to make a dictionary mapping data variables to inputs e.g.

my_stan_input = {"y": 2}

(Optional) Save the input as a json file:

import json
with open("my_stan_input.json", "w") as f:
    json.dump(my_stan_input, f)

Step 2

Instantiate a CmdstanModel

from cmdstanpy import CmdStanModel
my_model = CmdStanModel(stan_file="my_stan_program.stan")

Cmdstanpy will use Stan’s compiler to create .hpp and executable files.

Step 3

Use the method CmdStanModel.sample to trigger adaptive HMC.

my_mcmc_results = my_model.sample(data=my_stan_input)

Step 4

Use the methods CmdStanMCMC.diagnose and CmdStanMCMC.summary for quick diagnostics.

summary = my_mcmc_results.summary()
diagnostics = my_mcmc_results.diagnose()

Step 5

Convert to arviz InferenceData and save

import arviz
my_idata = arviz.from_cmdstanpy(my_mcmc_results)
my_idata.to_json("my_arviz_idata.json")

Stan references

Cmdstanpy docs

Stan reference manual

Stan functions reference

Stan User’s guide

stan-dev github organisation

Next time / homework

How to Program your own mcmc!
I ran MCMC, now what??

References

Andrieu, Christophe, and C Andrieu. 2003. “An Introduction to MCMC for Machine Learning.”

Beskos, Alexandros, Natesh S. Pillai, Gareth O. Roberts, Jesus M. Sanz-Serna, and Andrew M. Stuart. 2010. “Optimal Tuning of the Hybrid Monte-Carlo Algorithm.” arXiv. https://doi.org/10.48550/arXiv.1001.4460.

Betancourt, Michael. 2018a. “The Convergence of Markov Chain Monte Carlo Methods: From the Metropolis Method to Hamiltonian Monte Carlo.” arXiv. https://doi.org/10.48550/arXiv.1706.01520.

———. 2018b. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat], July. http://arxiv.org/abs/1701.02434.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics 21 (6): 1087–92. https://doi.org/10.1063/1.1699114.

Štrumbelj, Erik, Alexandre Bouchard-Côté, Jukka Corander, Andrew Gelman, Håvard Rue, Lawrence Murray, Henri Pesonen, Martyn Plummer, and Aki Vehtari. 2023. “Past, Present, and Future of Software for Bayesian Inference.”