Welcome!

This is a course about Bayesian statistics, targeted at systems biologists.

There are three intended learning outcomes:

  1. Understand the theoretical basis for applying Bayesian data analysis to practical scientific problems

  2. Develop a familiarity with implementing Bayesian data analysis using modern software tools

  3. Gain deep understanding of both theory and practice of elements of Bayesian data analysis that are particularly relevant to computational biology, including custom hierarchical models, large analyses and statistical models with embedded ODE systems.

General format

Each week we have a one-hour seminar. The goal is to spend the time approximately as follows:

  1. 25-35mins on ‘theory’, aka learning things from the book and getting more reading material

  2. 25-35mins on practical computer work

Plan

Week 1: What is Bayesian inference?

Theory

Statistical inference in general

Bayesian statistical inference

The big challenge: dimensionality

Practice

Set up development environment

git basics

Install Stan and cmdstanpy

Reading

Jaynes (2003, Ch. 1)

Laplace (1986)

Box and Tiao (1992, Ch. 1.1)

Week 2: MCMC and Stan

Theory

What is MCMC?

Hamiltonian Monte Carlo

Probabilistic programming

Practice

Run an MCMC algorithm and inspect the results

Reading

Betancourt (2018)

Week 3: Metropolis-Hastings

Week 4: After MCMC: diagnostics, and decisions

Theory

Diagnostics: convergence, divergent transitions, effective sample size

Model evaluation as decision theory

Why negative log likelihood is a good default loss function

Practice

Diagnose some good and bad MCMC runs

Reading

Vehtari et al. (2021)

Vehtari, Gelman, and Gabry (2017)

Week 5: Regression models in biology

Theory

Generalised linear models

Prior elicitation

Hierarchical models

Practice

Compare some statistical models of a simulated biological dataset

Reading

Betancourt (2024)

Week 6: Hierarchical models

Week 7: ODEs

Theory

What is an ODE?

ODE solvers

ODE solvers inside probabilistic programs

Practice

Fit a model with an ODE.

Reading

Timonen et al. (2022)

Week 8: Bayesian workflow

Theory

Parts of a statistical anlaysis (not just inference!)

Why Bayesian workflow is complex: non-linearity and plurality

Writing scalable statistical programming projects

Practice

Write a scalable statistical analysis with bibat.

Reading

Gelman et al. (2020)

Week 9-10: Project

Format: one hour joint feedback and help session

References

Betancourt, Michael. 2018. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv:1701.02434 [Stat], July. http://arxiv.org/abs/1701.02434.
———. 2024. “Hierarchical Modeling.” https://betanalpha.github.io/assets/case_studies/hierarchical_modeling.html.
Box, George E. P., and George C. Tiao. 1992. “Bayesian Inference in Statistical Analysis.” A Wiley-Interscience Publication. New York: Wiley. https://onlinelibrary-wiley-com.proxy.findit.cvt.dk/doi/epdf/10.1002/9781118033197.
Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C. Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian Workflow.” arXiv:2011.01808 [Stat], November. http://arxiv.org/abs/2011.01808.
Jaynes, E. T. 2003. “Probability Theory: The Logic of Science.” Edited by G. Larry Bretthorst. Cambridge, UK: https://readyforai.com/download/probability-theory-the-logic-of-science-pdf/; Cambridge University Press.
Laplace, Pierre Simon. 1986. “Memoir on the Probability of the Causes of Events.” Statistical Science 1 (3). https://doi.org/10.1214/ss/1177013621.
Timonen, Juho, Nikolas Siccha, Ben Bales, Harri Lähdesmäki, and Aki Vehtari. 2022. “An Importance Sampling Approach for Reliable and Efficient Inference in Bayesian Ordinary Differential Equation Models.” arXiv. https://doi.org/10.48550/arXiv.2205.09059.
Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32. https://doi.org/10.1007/s11222-016-9696-4.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved R^ for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2): 667–718. https://doi.org/10.1214/20-BA1221.