import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import stanio
from cmdstanpy import CmdStanModel
Regression models for describing measurements
What is regression?
Regression is a nice way of modelling measurements.
Maybe you have seen this regression model before:
\[ y = \alpha + X\cdot\beta + \epsilon \]
Components:
- \(y\): a measurement
- \(\alpha\) and \(\beta\) some unknown parameters, the same for every measurement
- \(X\) some real-valued features
- \(\epsilon\) an unknown number quantifying the difference between the prediction and the observation, different for each measurement
Often \(\epsilon\) is assumed to have a normal distribution with location zero and scale parameter \(\sigma\).
Another way of saying the same thing:
\[ y \sim N(\alpha + X\cdot\beta, \sigma) \]
I prefer this notation because it nicely separates the different components, and doesn’t hide the error parameter \(\sigma\). It also makes it clear what you might change. For example \(\alpha + X\cdot\beta\) is just one option: there are many ways in which measurements and predictors can be related. The normal distribution is also not required, in fact it is often inappropriate.
The key features of a regression model:
- a predictor that is some function of the parameters and features
- a probabilistic relationship between the predictor and the measurement
In the context of regression modelling, Bayesian inference lets you give free rein to your creativity when designing regressions, so you can make models that represent what you know about the measurement process.
Generalised linear models
This is a popular and powerful class of regression model with the following distinguishing characteristics:
The predictor is a linear function of the parameters and features, e.g. \(\alpha + X\cdot\beta\)
The probability distribution describing measurement errors is not necessarily the normal distribution, e.g. log-normal, Poisson, Dirichlet, Gamma, …
There is a link function that connects the linear predictor with the probability distribution, e.g. \(\ln(\alpha + X\cdot\beta)\).
An example GLM for positive-constrained measurements:
\[ y \sim LN(\ln(\alpha+X\cdot\beta), \sigma) \]
Tips for designing regression models
- Start with the simplest GLM that obeys all known data constraints.
- Try log-transforming things.
- Aim to explicitly represent how you think the measurement apparatus works.
- Heavy tailed distributions are often better than the normal distribution.
- Remember that varying measurement error is an option.
Reading
A practical guide to regression modelling: Gelman, Hill, and Vehtari (2020)
The section of the Stan user’s guide on regression models is really nice.
Modern Statistics for Modern Biology is an online (and physical) textbook with some very good material about biology-relevant regression models: Susan Holmes and Wolfgang Huber (2019).
Example
Teddy has never been in the lab and is bad at pipetting. Unfortunately, some label needed to be put in some Eppendorf tubes, and noone else was available to do it themselves or even supervise.
Each tube had a required volume of label which Teddy tried to hit, but probably there was some inaccuracy due to bad eyesight, poor hand control or something.
In addition, there was also probably some consistent bias, as the pipette was consistently adding too much or too little liquid. It seemed pretty old.
Luckily, someone was able to measure how much label ended up in 8 out of the 24 tubes with pretty good accuracy. Now we want to estimate how much label there is in the other 16 tubes, taking into account these measurements as well as what we know about the likely biased pipette and inconsistent pipetter.
To describe this situation we’ll first think of a regression model that describes the measurement setup, then use Python to simulate data from the model given some plausible parameter values. Next we’ll implement the model in Stan, then fit the simulated data using MCMC and then analyse the results.
Regression model
To model the noise that Teddy introduced by being bad at pipetting and the bias introduced by the bad pipette, we need some parameters that connect the known target volumes with the unknown true volumes. Let’s call them \(noise\) and \(bias\). Since the volumes are constrained positive, a distribution that automatically excludes negative numbers is probably a good idea: the log-normal distribution is usually a good starting point. This equation describes a plausible relationship:
\[ volume \sim LN(\ln{(target\cdot bias)}, noise) \]
To model the helpful measurements, we use another log-normal distribution and assume the measuring device is unbiased and has known log-scale standard error \(cal\ error\):1
1 NB the scale parameter of a lognormal distribution represents multiplicative error
\[ measurements \sim LN(\ln{volume}, cal\ error) \]
To round off the model we can think about the likely values of the unknown parameters \(bias\) and \(noise\). \(bias\) is likely not to be too far away from 1, otherwise someone would have probably thrown the pipette away already. A prior distribution that puts most of its mass between 0.75 and 1.25 therefore seems reasonable. Similarly, a prior for \(noise\) should probably not imply that Teddy’s pipetting errors regularly exceeded 30%. This consideration motivates a prior for \(noise\) that puts most of its mass below 0.15.
Simulating fake data
First some imports: as usual for this course we’ll be using arviz
, matplotlib
, cmdstanpy
, pandas
and numpy
. stanio
is a handy utility library for Stan: it is a dependency of cmdstanpy so you shouldn’t need to install it explicitly.
Now some hardcoded numbers, including true values for the parameters here: \(bias\) is 0.88 and \(noise\) is 0.1. Note that \(cal\ error\) is much smaller than \(bias\).
= 24
N = 8
N_CAL = np.array(
TARGET_VOLUMES
[*([200] * 8),
*([400] * 8),
*([800] * 8),
]
)= np.array([1, 4, 9, 11, 15, 19, 21, 22])
MEASUREMENT_IX = 0.02
CAL_ERROR = 0.88
BIAS_FACTOR = 0.1
NOISE = 12345 RNG_SEED
Simulate the true volumes
= np.random.default_rng(seed=RNG_SEED)
rng = [
ln_mean * BIAS_FACTOR)
np.log(target for target in TARGET_VOLUMES
]= rng.lognormal(mean=ln_mean, sigma=NOISE)
volumes volumes
array([152.64294349, 199.70810807, 161.32449302, 171.49715397,
174.67894069, 163.43175945, 153.50063823, 187.79919405,
364.94147086, 289.55488638, 445.13256694, 387.79655766,
326.2592979 , 385.23402356, 335.94110379, 349.87019831,
761.78380169, 620.86368119, 745.73037525, 809.71007835,
803.52489045, 683.21425317, 770.52360505, 598.61585552])
Plot the volumes and the targets.
= plt.subplots()
f, ax = np.logspace(np.log10(100), np.log10(1000), 30)
bins =bins)
ax.hist(volumes, binsfor t in (200, 400, 800):
="red")
ax.axvline(t, color
ax.semilogx()200, 400, 800], [200, 400, 800]);
ax.set_xticks([set(
ax.="volume ($\\mu$l)",
xlabel="Frequency",
ylabel="How much label ended up in the tubes"
title; )
Simulate measurements for tubes in the MEASUREMENT_IX
.
= [
measurements
rng.lognormal(np.log(vol), CAL_ERROR)for vol in volumes[MEASUREMENT_IX]
]
pd.DataFrame({"target volume": np.array(TARGET_VOLUMES)[MEASUREMENT_IX],
"actual volume": volumes[MEASUREMENT_IX],
"measured volume": measurements
})
target volume | actual volume | measured volume | |
---|---|---|---|
0 | 200 | 199.708108 | 199.077273 |
1 | 200 | 174.678941 | 176.256328 |
2 | 400 | 289.554886 | 281.877576 |
3 | 400 | 387.796558 | 387.163512 |
4 | 400 | 349.870198 | 362.149468 |
5 | 800 | 809.710078 | 853.238785 |
6 | 800 | 683.214253 | 693.919342 |
7 | 800 | 770.523605 | 783.399634 |
Writing the model in Stan and sampling the simulated data
I wrote up the implied statistical model in a Stan file at src/stan/ pipette.stan
. This code loads this Stan file as a CmdStanModel
object, checks its formatting and prints it out.
Note that the model internally puts the data on log scale and then standardises it: this is a bit annoying but makes it way easier to set priors and can ultimately save you a lot of trouble.
= CmdStanModel(stan_file="../src/stan/pipette.stan")
model format(overwrite_file=True, canonicalize=True, backup=False)
model.print(model.code())
17:34:52 - cmdstanpy - INFO - compiling stan file /Users/tedgro/repos/dtu-qmcm/bayesian_statistics_for_computational_biology/src/stan/pipette.stan to exe file /Users/tedgro/repos/dtu-qmcm/bayesian_statistics_for_computational_biology/src/stan/pipette
17:34:58 - cmdstanpy - INFO - compiled model executable: /Users/tedgro/repos/dtu-qmcm/bayesian_statistics_for_computational_biology/src/stan/pipette
functions {
vector standardise(vector v, real m, real s) {
return (v - m) / s;
}
real standardise(real v, real m, real s) {
return (v - m) / s;
}
vector unstandardise(vector u, real m, real s) {
return m + u * s;
}
real unstandardise(real u, real m, real s) {
return m + u * s;
}
}
data {
int<lower=1> N;
int<lower=0> N_cal;
vector<lower=0>[N] target_volume;
vector<lower=0>[N_cal] y;
array[N_cal] int<lower=1, upper=N> measurement_ix;
real<lower=0> cal_error;
int<lower=0, upper=1> likelihood;
}
transformed data {
vector[N_cal] y_ls = standardise(log(y), mean(log(y)), sd(log(y)));
vector[N] target_volume_ls = standardise(log(target_volume), mean(log(y)),
sd(log(y)));
real cal_error_s = cal_error / sd(log(y));
}
parameters {
real<lower=0> volume_noise_s;
real bias_factor_l;
vector[N] volume_ls;
}
model {
volume_noise_s ~ lognormal(log(0.1), 0.5);
bias_factor_l ~ normal(0, 0.15);
volume_ls ~ normal(target_volume_ls + bias_factor_l, volume_noise_s);
if (likelihood) {
for (i in 1 : N_cal) {
y_ls[i] ~ normal(volume_ls[measurement_ix[i]], cal_error_s);
}
}
}
generated quantities {
real bias_factor = exp(bias_factor_l);
real volume_noise = volume_noise_s * sd(log(y));
vector[N] volume = exp(unstandardise(volume_ls, mean(log(y)), sd(log(y))));
vector[N_cal] y_rep;
vector[N_cal] llik;
for (i in 1 : N_cal) {
int ms_ix = measurement_ix[i];
y_rep[i] = lognormal_rng(log(volume[ms_ix]), cal_error);
llik[i] = lognormal_lpdf(y[i] | log(volume[ms_ix]), cal_error);
}
}
This code loads some data into a dictionary that is compatible with Stan and carries out two MCMC runs, one in prior mode and one in posterior mode.
= stanio.json.process_dictionary(
stan_input_posterior
{"N": N,
"N_cal": N_CAL,
"target_volume": TARGET_VOLUMES,
"y": measurements,
"measurement_ix": MEASUREMENT_IX + 1,
"cal_error": CAL_ERROR,
"likelihood": 1,
}
)= stan_input_posterior | {"likelihood": 0}
stan_input_prior = model.sample(
mcmc_prior =stan_input_prior,
data=0.999,
adapt_delta=12,
max_treedepth=RNG_SEED,
seed
)= model.sample(data=stan_input_posterior, seed=RNG_SEED)
mcmc_posterior
mcmc_prior.diagnose() mcmc_posterior.diagnose()
17:34:58 - cmdstanpy - INFO - CmdStan start processing
17:34:58 - cmdstanpy - INFO - CmdStan done processing.
17:34:58 - cmdstanpy - INFO - CmdStan start processing
17:34:58 - cmdstanpy - INFO - CmdStan done processing.
17:34:58 - cmdstanpy - WARNING - Non-fatal error during sampling:
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
Exception: normal_lpdf: Scale parameter is 0, but must be positive! (in 'pipette.stan', line 38, column 2 to column 71)
Consider re-running with show_console=True if the above output is unclear!
'Processing csv files: /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpmwrg_bjx/pipette4esu09xh/pipette-20240503173458_1.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpmwrg_bjx/pipette4esu09xh/pipette-20240503173458_2.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpmwrg_bjx/pipette4esu09xh/pipette-20240503173458_3.csv, /var/folders/ql/z_6fb5792v1_8tscf4hg5byc0000gp/T/tmpmwrg_bjx/pipette4esu09xh/pipette-20240503173458_4.csv\n\nChecking sampler transitions treedepth.\nTreedepth satisfactory for all transitions.\n\nChecking sampler transitions for divergences.\nNo divergent transitions found.\n\nChecking E-BFMI - sampler transitions HMC potential energy.\nE-BFMI satisfactory.\n\nEffective sample size satisfactory.\n\nSplit R-hat values satisfactory all parameters.\n\nProcessing complete, no problems detected.\n'
The diagnostics seem ok, though interestingly the prior was pretty tricky to sample accurately.
Loading the MCMC results with arviz
This code loads both MCMC runs into an arviz InferenceData
object.
= {"obs": MEASUREMENT_IX, "tube": range(N)}
coords ={
dims"y": ["obs"],
"y_rep": ["obs"],
"target_volume": ["tube"],
"true_volume": ["tube"],
"volume": ["tube"],
"tube": ["tube"]
}= az.from_cmdstanpy(
idata =mcmc_posterior,
posterior=mcmc_prior,
prior="llik",
log_likelihood=stan_input_posterior | {
observed_data"true_volume": volumes, "tube": range(N)
},={"y": "y_rep"},
posterior_predictive=coords,
coords=dims
dims
) idata
-
<xarray.Dataset> Size: 2MB Dimensions: (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999 * volume_ls_dim_0 (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23 * tube (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23 Data variables: volume_noise_s (chain, draw) float64 32kB 0.149 0.2378 ... 0.1396 0.1959 bias_factor_l (chain, draw) float64 32kB -0.2014 -0.07906 ... -0.07839 volume_ls (chain, draw, volume_ls_dim_0) float64 768kB -1.275 ... ... bias_factor (chain, draw) float64 32kB 0.8176 0.924 ... 0.9842 0.9246 volume_noise (chain, draw) float64 32kB 0.09116 0.1455 ... 0.1199 volume (chain, draw, tube) float64 768kB 182.9 192.1 ... 855.2 Attributes: created_at: 2024-05-03T15:34:59.287263 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 264kB Dimensions: (chain: 4, draw: 1000, obs: 8) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999 * obs (obs) int64 64B 1 4 9 11 15 19 21 22 Data variables: y_rep (chain, draw, obs) float64 256kB 197.1 177.7 290.6 ... 693.9 779.7 Attributes: created_at: 2024-05-03T15:34:59.293103 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 264kB Dimensions: (chain: 4, draw: 1000, llik_dim_0: 8) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 7 ... 993 994 995 996 997 998 999 * llik_dim_0 (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7 Data variables: llik (chain, draw, llik_dim_0) float64 256kB -3.875 -2.193 ... -3.675 Attributes: created_at: 2024-05-03T15:34:59.337397 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 204kB Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999 Data variables: lp (chain, draw) float64 32kB 21.01 18.71 ... 24.94 22.19 acceptance_rate (chain, draw) float64 32kB 0.9988 0.7904 ... 0.8739 0.9229 step_size (chain, draw) float64 32kB 0.3503 0.3503 ... 0.4368 0.4368 tree_depth (chain, draw) int64 32kB 3 4 3 3 3 3 3 3 ... 3 3 3 3 3 3 4 n_steps (chain, draw) int64 32kB 7 31 7 15 7 7 7 ... 7 7 7 7 7 7 15 diverging (chain, draw) bool 4kB False False False ... False False energy (chain, draw) float64 32kB -6.192 -8.972 ... -15.39 -11.6 Attributes: created_at: 2024-05-03T15:34:59.291527 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 2MB Dimensions: (chain: 4, draw: 1000, volume_ls_dim_0: 24, tube: 24, obs: 8, llik_dim_0: 8) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999 * volume_ls_dim_0 (volume_ls_dim_0) int64 192B 0 1 2 3 4 5 ... 19 20 21 22 23 * tube (tube) int64 192B 0 1 2 3 4 5 6 7 ... 17 18 19 20 21 22 23 * obs (obs) int64 64B 1 4 9 11 15 19 21 22 * llik_dim_0 (llik_dim_0) int64 64B 0 1 2 3 4 5 6 7 Data variables: volume_noise_s (chain, draw) float64 32kB 0.02974 0.02728 ... 0.2746 bias_factor_l (chain, draw) float64 32kB -0.1025 -0.07485 ... -0.1921 volume_ls (chain, draw, volume_ls_dim_0) float64 768kB -1.18 ... 0... bias_factor (chain, draw) float64 32kB 0.9026 0.9279 ... 0.9219 0.8252 volume_noise (chain, draw) float64 32kB 0.0182 0.01669 ... 0.1236 0.168 volume (chain, draw, tube) float64 768kB 193.8 185.0 ... 693.1 y_rep (chain, draw, obs) float64 256kB 184.8 189.1 ... 1.002e+03 llik (chain, draw, llik_dim_0) float64 256kB -8.981 ... -87.32 Attributes: created_at: 2024-05-03T15:34:59.331496 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 204kB Dimensions: (chain: 4, draw: 1000) Coordinates: * chain (chain) int64 32B 0 1 2 3 * draw (draw) int64 8kB 0 1 2 3 4 5 6 ... 994 995 996 997 998 999 Data variables: lp (chain, draw) float64 32kB 72.09 71.09 ... 23.28 14.77 acceptance_rate (chain, draw) float64 32kB 0.9856 0.9735 ... 0.9808 0.99 step_size (chain, draw) float64 32kB 0.03423 0.03423 ... 0.04274 tree_depth (chain, draw) int64 32kB 4 4 4 4 4 4 4 4 ... 7 6 7 7 6 6 7 n_steps (chain, draw) int64 32kB 15 31 15 15 15 ... 127 63 63 127 diverging (chain, draw) bool 4kB False False False ... False False energy (chain, draw) float64 32kB -58.38 -61.21 ... -14.09 -3.941 Attributes: created_at: 2024-05-03T15:34:59.334204 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
-
<xarray.Dataset> Size: 896B Dimensions: (N_dim_0: 1, N_cal_dim_0: 1, tube: 24, obs: 8, measurement_ix_dim_0: 8, cal_error_dim_0: 1, likelihood_dim_0: 1) Coordinates: * N_dim_0 (N_dim_0) int64 8B 0 * N_cal_dim_0 (N_cal_dim_0) int64 8B 0 * tube (tube) int64 192B 0 1 2 3 4 5 6 ... 18 19 20 21 22 23 * obs (obs) int64 64B 1 4 9 11 15 19 21 22 * measurement_ix_dim_0 (measurement_ix_dim_0) int64 64B 0 1 2 3 4 5 6 7 * cal_error_dim_0 (cal_error_dim_0) int64 8B 0 * likelihood_dim_0 (likelihood_dim_0) int64 8B 0 Data variables: N (N_dim_0) int64 8B 24 N_cal (N_cal_dim_0) int64 8B 8 target_volume (tube) int64 192B 200 200 200 200 ... 800 800 800 800 y (obs) float64 64B 199.1 176.3 281.9 ... 693.9 783.4 measurement_ix (measurement_ix_dim_0) int64 64B 2 5 10 12 16 20 22 23 cal_error (cal_error_dim_0) float64 8B 0.02 likelihood (likelihood_dim_0) int64 8B 1 true_volume (tube) float64 192B 152.6 199.7 161.3 ... 770.5 598.6 Attributes: created_at: 2024-05-03T15:34:59.336259 arviz_version: 0.17.1 inference_library: cmdstanpy inference_library_version: 1.2.1
Next we look at the summaries of both the posterior and prior.
for group_name in ["prior", "posterior"]:
= idata.get(group_name)
group = az.summary(
group_summary
group,=[
var_names"volume_noise", "bias_factor", "volume_noise_s", "bias_factor_l"
]
) display(group_summary)
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
volume_noise | 0.066 | 0.034 | 0.016 | 0.128 | 0.002 | 0.001 | 224.0 | 380.0 | 1.01 |
bias_factor | 0.994 | 0.143 | 0.731 | 1.252 | 0.015 | 0.010 | 80.0 | 120.0 | 1.06 |
volume_noise_s | 0.108 | 0.055 | 0.026 | 0.210 | 0.003 | 0.002 | 224.0 | 380.0 | 1.01 |
bias_factor_l | -0.016 | 0.141 | -0.286 | 0.242 | 0.015 | 0.011 | 80.0 | 120.0 | 1.06 |
mean | sd | hdi_3% | hdi_97% | mcse_mean | mcse_sd | ess_bulk | ess_tail | r_hat | |
---|---|---|---|---|---|---|---|---|---|
volume_noise | 0.113 | 0.026 | 0.068 | 0.160 | 0.001 | 0.001 | 1303.0 | 1943.0 | 1.0 |
bias_factor | 0.887 | 0.055 | 0.781 | 0.986 | 0.001 | 0.001 | 2493.0 | 2380.0 | 1.0 |
volume_noise_s | 0.185 | 0.043 | 0.111 | 0.261 | 0.001 | 0.001 | 1303.0 | 1943.0 | 1.0 |
bias_factor_l | -0.122 | 0.061 | -0.229 | 0.002 | 0.001 | 0.001 | 2493.0 | 2380.0 | 1.0 |
Investigating the results
This plot compares the measurements with the observations.
az.plot_lm(=idata.observed_data["y"],
y=idata.observed_data["tube"].sel(tube=MEASUREMENT_IX + 1),
x=idata.posterior_predictive["y_rep"]
y_hat
)= plt.gca()
ax ax.semilogy()
This plot compares the volume_noise
and bias_factor
samples with the true values that we used to simulate the data.
az.plot_posterior(
idata.prior,=["volume_noise", "bias_factor"],
var_names="hist",
kind="hide",
hdi_prob=None,
point_estimate=[12, 4]
figsize
)= plt.gcf()
f = f.axes
axes
az.plot_posterior(
idata.posterior,=["volume_noise", "bias_factor"],
var_names="hist",
kind="hide",
hdi_prob=None,
point_estimate=[12, 4],
figsize=axes,
ax="tab:orange"
color
)for ax, truth in zip(f.axes, [NOISE, BIAS_FACTOR]):
="red") ax.axvline(truth, color
This plot shows the samples for all the tubes’ volumes, including those that weren’t measured, alongside the true volumes.
az.plot_lm(=idata.observed_data["tube"],
x=idata.observed_data["true_volume"],
y=idata.posterior["volume"],
y_hat=False,
grid={"label": "true volume"},
y_kwargs=[10, 5],
figsize=False,
legend
)= plt.gca()
ax for i in MEASUREMENT_IX:
+0.1, volumes[i], "obs", zorder=1000)
ax.text(iset(xlabel="tube", ylabel="volume ($\\mu$l)");
ax. ax.semilogy()
So, what is the probability that Teddy put less than 350 \(\mu\)l of label into tube 10, even though the target amount was 400\(\mu\)l?
az.plot_posterior(
idata.prior,=["volume"],
var_names={"tube": [10]},
coords="hist",
kind="hide",
hdi_prob=None,
point_estimate=350,
ref_val=[12, 4],
figsize=np.linspace(250, 600, 30),
bins )
Phew, only about 13%, that’s probably fine right?