Introduction to Bayesian inference

Introduction

Today’s topics

What is Bayesian statistical inference?
Why is it useful?
The big challenge

Computer goals

Set up git/ssh, Python, cmdstanpy and cmdstan

What is Bayesian statistical inference?

Probability function

A function that can measure the water in a jug.

i.e.

\(p: S \rightarrow [0,1]\) where

\(S\) is an event space
If \(A, B \in S\) are disjoint, then \(p(A\cup B) = p(A) + p(B)\)

Bayesian epistemology

Probability functions can describe belief, e.g.

“Definitely B”:

“Not sure if A or B”:

“B a bit more plausible than A”:

Statistical Inference

Figure 2: A nice soup: here is the recipe

In: facts about a ~~spoonful~~ sample

Out: propositions about a ~~soup~~ population

e.g.

spoonful not salty \(\rightarrow\) soup not salty
no carrots in spoon \(\rightarrow\) no carrots in soup

Bayesian statistical inference

Statistical inference resulting in a probability.

e.g.

spoon \(\rightarrow\) \(p(\text{soup not salty})\) = 99.9%
spoon \(\rightarrow\) \(p(\text{no carrots in soup})\) = 95.1%

Non-Bayesian inferences:

spoon \(\rightarrow\) Best estimate of [salt] is 0.1mol/l
\(p_{null}(\text{spoon})\) = 4.9% \(\rightarrow\) no carrots (p=0.049)

Why is Bayesian statistical inference useful?

General reasons

Easy to interpret

Bayesian inference produces probabilities, which can be interpreted in terms of information and plausible reasoning.

e.g. “According to the model…”

“…x is highly plausible.”
“…x is more plausible than y.”
“…the data doesn’t contain enough information for firm conclusions about x.”

Old

(https://en.wikipedia.org/wiki/Pierre-Simon_Laplace)

Bayesian inference is old!

This means

it is well understood mathematically.
conceptual surprises are relatively rare.
there are many compatible frameworks.

An easy way to represent your information

Probabilities decompose nicely:

\[ p(\theta, y) = p(\theta)p(y\mid\hat{y}(\theta)) \]

\(p(\theta)\): nice form for background information, e.g. anything non-experimental
\(\hat{y}(\theta)\): nice form for structural information, e.g. physical laws
\(p(y\mid\hat{y}(\theta))\): nice form for measurement information, e.g. instrument accuracy

Reasons specific to computational biology

Regression models: good for describing measurements

Regression: measured value noisily depends on the true value e.g. \(y \sim N(\hat{y}, \sigma)\).

Biology experiments often have measurement processes with awkward features. e.g.

heteroskedasticity (amount of noise depends on measured value)
constraints (e.g. non-negativity, compositionality)
unknown latent bias (e.g. the pump is supposed to add \(0.05cm^3\) per min, but does it?)

Bayesian inference is good at describing these.

Multi-level models: good for describing sources of variation

Figure 6: plot from https://github.com/teddygroves/baseball

Measurement model:

\(y \sim binomial(K, logit(ability))\)

Gpareto model:

\(ability \sim GPareto(m, k, s)\)

Normal model:

\(ability \sim N(\mu, \tau)\)

Generative models: good for representing structural information

Information about hares (\(u\)) and lynxes (\(v\)):

\[\begin{align*} \frac{d}{dt}u &= (\alpha - \beta v)u \\ \frac{d}{dt}v &= (-\gamma + \delta u)v \end{align*}\]

i.e. a deterministic function turning \(\alpha\), \(\beta\), \(\gamma\), \(\delta\), \(u(0)\) and \(v(0)\) into \(u(t)\) and \(v(t)\).

The big challenge

\(p(\theta \mid y)\) is easy to evaluate but hard to integrate.

This is bad as we typically want something like

\[ p([salt] < 0.1, spoon=s) \]

which is equivalent to

\[ \int_{0}^{0.1}p([salt], spoon=s)d[salt] \]

\(p(\theta \mid y)\) has one dimension per model parameter.

The solution: MCMC

Strategy:

Find a series of numbers that
- quickly finds the high-probabiliy region in parameter space
- reliably matches its statistical properties
Do sample-based approximate integration.

It (often) works!

We can tell when it doesn’t work!

Homework

Things to read

Box and Tiao (1992, Ch. 1.1) (available from dtu findit) gives a nice explanation of statistical inference in general and why Bayes.

Historical interest:

Laplace (1986) and Stigler (1986)
Jaynes (2003) Preface

Things to set up

Python

First get a recent (ideally 3.11+) version of Python This can be very annoying so talk to me if necessary!

Next get used to Python virtual environments.

The method I like is to put the virtual environment in a folder .venv inside the root of my project:

$ python -m venv .venv --prompt=bscb

Then to use:

Tip: use an ergonomic alias to activate venvs e.g. alias va="source .venv/bin/activate"

$ source .venv/bin/activate
# ... do work
$ deactivate

Git and ssh

git clone git@github.com:teddygroves/bayesian_statistics_for_systems_biologists.git

Cmdstanpy and cmdstan

First install them:

$ pip install cmdstanpy
$ python -m cmdstanpy.instsall_cmdstan

Now test if they work

from cmdstanpy import CmdStanModel
filename = "example_stan_program.stan" 
code = "data {} parameters {real t;} model {t ~ std_normal();}"
with open(filename, "w") as f:
    f.write(code)
model = CmdStanModel(stan_file=filename)
mcmc = model.sample()

Next time

Theory

Hamiltonian Monte Carlo:

what?
why?

MCMC diagnostics

Computer

Stan, cmdstanpy, arviz:

formats
workflow
write a model

References

Box, George E. P., and George C. Tiao. 1992. “Bayesian Inference in Statistical Analysis.” A Wiley-Interscience Publication. New York: Wiley. https://onlinelibrary-wiley-com.proxy.findit.cvt.dk/doi/epdf/10.1002/9781118033197.

Jaynes, E. T. 2003. “Probability Theory: The Logic of Science.” Edited by G. Larry Bretthorst. Cambridge, UK: https://readyforai.com/download/probability-theory-the-logic-of-science-pdf/; Cambridge University Press.

Laplace, Pierre Simon. 1986. “Memoir on the Probability of the Causes of Events.” Statistical Science 1 (3). https://doi.org/10.1214/ss/1177013621.

Stigler, Stephen M. 1986. “Laplace’s 1774 Memoir on Inverse Probability.” Statistical Science 1 (3). https://doi.org/10.1214/ss/1177013620.