import numpy as np
= np.matrix([[0.1, 0.1, 0.8], [0.5, 0.1, 0.4], [0.5, 0.2, 0.3]])
T print(T)
[[0.1 0.1 0.8]
[0.5 0.1 0.4]
[0.5 0.2 0.3]]
Welcome back!
Plan for today:
Recap from last week:
A Markov Chain is any process that is memoryless such that
\[ p(x^{i} | x^{i-1},...,x^{1}) = p(x^{i} | x^{i-1}). \]
For example:
The transitions can be measured as discrete time steps with the following matrix representation
[[0.1 0.1 0.8]
[0.5 0.1 0.4]
[0.5 0.2 0.3]]
Given an initial starting position
We can simulate the probabilities of the next step given the transition matrix.
Following this again we can simulate the states after two steps
There’s a convenient way to calculate the next step given the starting condition.
What happens if we continue doing this for a long time?
And how does this change when taking the next step?
This Markov Chain has the property of being a stationary distribution. That satisfies the following
\[ \pi = \pi T. \]
Our objective is to estimate \(\pi\), which represents the target distribution.
This behaviour of the Markov Chain is only satisfied if two conditions are met.
Any point in the Markov chain can be reached by any other point
If we are able to draw samples from a Markov Chain that satisfies these properties we can generate samples from the stationary proposal distribution. After drawing samples from the sample distribution we can investigate the quantities of interest using Monte Carlo integration (read: counting samples).
One property that is not required for a Markov Chain but satisfies the above two properties is the detailed balance. This is not a requirement, but it’s pretty easy to define a detailed balance rather than to define general balance. This ensures that the Markov chain is reversible, in other words
\[ \pi(x)*T(x'|x) = \pi(x')*T(x|x')\].
If we define a reducible process it is defined to be irreducible and aperiodic by default. It is a periodic because you can always go back, and irreducible because a region cannot be entered if there is no way of returning.
The process of generating a Markov Chain with these properties means that we know we are sampling from a stationary target distribution, if we have enough samples.
Rather than the previous graph networks described before we can expand this to the continuous number line.
Note: This isn’t always a possibility to transition between discrete and continuous number lines, it just works out for this case
Rather than sampling from \(\pi(x)\), representing the discrete case, we will change the notation to \(p(x)\). And the transition kernel, rather than a matrix \(T(x'|x)\) will be represented by \(K(x'|x).\)
Metropolis-Hastings enforces the reversibility constraint using the accept-reject function
\[ A(x,x') = min(1, \frac{p(x')g(x|x')}{p(x)g(x'|x)}) \]
and often, a symmetric proposal distribution, e.g.
\[ g(x'|x) = N(x, \sigma). \]
The resulting kernel is represented as
\[ K(x'|x) = g(x'|x)*A(x,x'). \]
The accept-reject function, and the symmetric proposal distribution were chosen to satisfy the detailed balance function
\[ p(x)g(x'|x)A(x,x') = p(x')g(x|x')A(x,x'). \]
Therefore, if we draw samples using the Metropolis-Hastings algorithm, we draw samples from the target distribution \(p(x)\).
Given a \(p(x) = N(2, 0.5)\) how would draw samples using the Metropolis-Hastings algorithm?
I suggest using logs due to numerical issues. Here’s an example function which you can use to evaluate the probability of the data.
Here’s also a multivariate random number generator to generate proposals.
Here is how you’d call the proposal function
prop_mu, prop_sigma = proposal_multi(current_mu, current_sigma)
the accept_reject probability should also be updated to account for the log-probability
accept_reject = np.exp(prob_prop - prob_current)
You should sample a 95% interval including a \(\mu = 5\) and a \(\sigma = 0.2\). This may be difficult at first to sample and I would recommend initialising at these values.
Given an n-dimensional cube of length=2 you place spheres of diameter=1 in each of the corners and then place another sphere in the middle.
How does the size of the middle sphere change as dimensions increase?