MCMC and Stan
Introduction
Welcome back!
Plan for today:
MCMC
The big picture
In A rule for evaluating the target function and maybe its gradients
Out: A Markov Chain of numbers that you can do Monte Carlo integration with.
Simple case
Problem
One random variable \(X\) with probability density function \(density\).
Aka a “one-dimensional parameter space”.
Evaluating \(density(x)\) for a given value \(x\) (aka “point in parameter space”) is easy.
Calculating the area under a region of the \(density\) curve (aka “the probability mass”) is expensive.
This is annoying, we need to know that!
Solution
Generate a series \(x_1, ..., x_i,..., x_n\) where every number depends on the previous number(s), i.e. a Markov chain.
To calculate \(x_{i+1}\), generate a random number and take into account \(density(x_i)\). 1
1 This is the interesting and tricky bit!
If this works, with a long enough series of numbers we get something like this:
The numbers from the Markov chain have to approximately agree with the target density function, i.e. in any region the number of dots is approximately proportional to the area under the curve.
Now we can do Monte Carlo integration, i.e. approximate the area under a region of curve by counting the red dots in that region.
Metropolis Hastings
The first (I think?) MCMC algorithm. Original paper: Metropolis et al. (1953).2
2 Metropolis was first author but didn’t do any work! That was Arianna Rosenbluth (programming) plus Marshall Rosenbluth & Edward Teller (maths)
Generates Markov chains that provably agree with arbitrary target density functions (in the asymptotic limit).
Roughly how it works:
- Choose candidate by randomly perturbing previous point \(x_i\)
- Accept or reject candidate randomly according to the ratio \(\frac{density(candidate)}{density(x_i)}\)
- \(x_{i+1}\) is candidate if accept else x_i
Doesn’t work for more than ~10 dimensional parameter spaces.
Hamiltonian Monte Carlo
Big picture: MCMC that works for large parameter spaces.
Key innovation: travel through parameter space quickly using gradients.
Illustration:
To decide how hard to flick the ball and how precisely to calculate its trajectory for a particular case, adaptation is required, i.e. running the algorithm in warm-up mode for a bit and learning by trial and error. How best to do adaptation is an important open question.
Limitations:
- No discrete parameters
- Performs badly when the target (log-scale) density function is wiggly.
Reading
Betancourt (2018b)
Betancourt (2018a)
Beskos et al. (2010)
Andrieu and Andrieu (2003)
Stan
Stan is:
A language for specifying probability density functions as Stan programs.
A compiler that turns Stan programs into instructions for inference engines.
An inference engine implementing adaptive HMC and some other algorithms.
A library of functions for calculating the gradients of interesting probability density functions.
Some interfaces for popular computer tools:
Why use Stan?
Alternatives: pymc, blackjax, Turing.jl tensorflow probability
Overview as of 2023: Štrumbelj et al. (2023).
Why I like Stan:
Big, active and knowledgeable community (most important reason)
Featureful (complex numbers, fast solvers, up-to-date diagnostics)
Fast (for CPU-bound, general purpose adaptive HMC)
Getting started with Stan in Python
Install cmdstanpy
pip install cmdstanpy
Use cmdstanpy to install the rest of Stan
python -m cmdstanpy.install_cmdstan --cores 2
I like to store Stan outputs using the library arviz. It also makes nice plots.
pip install arviz
Write a Stan program
A Stan program consists of function definitions, variable declarations and statements, organised into {...}
delimited blocks, e.g.
data {
real y; # a variable declaration
}model {
0, 1.4); # a statement
y ~ normal( }
The purpose of a Stan program is to define the probability density for any combination of data and parameters.
It is ok for there to be no parameters:
transformed data {
real y = 2; # this is both a statement and a declaration!
}model {
0, 1.4); # the total density is N(2 | 0, 1.4) = 0.103
y ~ normal( }
or no data:
parameters {
real alpha;
}model {
0, 1.4); # Stan can find the density for any alpha
alpha ~ normal( }
Cmdstanpy workflow
Step 1
Use standard Python tools to make a dictionary mapping data variables to inputs e.g.
= {"y": 2} my_stan_input
(Optional) Save the input as a json file:
import json
with open("my_stan_input.json", "w") as f:
json.dump(my_stan_input, f)
Step 2
Instantiate a CmdstanModel
from cmdstanpy import CmdStanModel
= CmdStanModel(stan_file="my_stan_program.stan") my_model
Cmdstanpy will use Stan’s compiler to create .hpp
and executable files.
Step 3
Use the method CmdStanModel.sample
to trigger adaptive HMC.
= my_model.sample(data=my_stan_input) my_mcmc_results
Step 4
Use the methods CmdStanMCMC.diagnose
and CmdStanMCMC.summary
for quick diagnostics.
= my_mcmc_results.summary()
summary = my_mcmc_results.diagnose() diagnostics
Step 5
Convert to arviz InferenceData
and save
import arviz
= arviz.from_cmdstanpy(my_mcmc_results)
my_idata "my_arviz_idata.json") my_idata.to_json(
Stan references
Next time / homework
How to Program your own mcmc!
I ran MCMC, now what??