Understanding the Metropolis-Hastings Algorithm: A Step-by-Step Guide with Python Examples

Utpal Kumar   4 minute read      

The Metropolis-Hastings (MH) algorithm is a cornerstone of Markov Chain Monte Carlo (MCMC) methods. It lets us draw samples from complex probability distributions — even ones we can only evaluate up to an unknown constant — which is exactly what Bayesian inference and many geophysical problems demand. In this post we break the algorithm down and implement it in Python.

The one mental model

You can’t sample the target distribution $\pi(x)$ directly, so instead you take a guided random walk: from where you are, propose a nearby step, then accept or reject it with a probability that biases the walk toward high-density regions. Do this long enough and the states you visit are samples from $\pi$.

The magic: the accept/reject rule uses only a ratio of $\pi$ values, so any normalizing constant cancels — that’s why MH can sample unnormalized distributions.

What is the Metropolis-Hastings algorithm?

MH produces a sequence of random samples from a target distribution for which direct sampling is hard. It’s especially valuable in high-dimensional spaces or with complex distributions.

Key concepts

  1. Target distribution $\pi(x)$: the distribution we want to sample from.
  2. Proposal distribution $q(x’ \mid x)$: a simpler distribution used to propose new states.
  3. Markov chain: a sequence of states where each depends only on the previous one.

The algorithm

  1. Initialization — start with an initial state $x_0$.
  2. Proposal step — from the current state $x_t$, propose a new state $x’$ using $q(x’ \mid x_t)$.
  3. Acceptance probability — compute

    \[\alpha = \min\left(1, \frac{\pi(x')\, q(x_t \mid x')}{\pi(x_t)\, q(x' \mid x_t)}\right)\]

    This biases the chain toward states with higher probability under the target.

  4. Accept/reject — draw $u \sim U(0,1)$. If $u \leq \alpha$, accept ($x_{t+1} = x’$); otherwise reject and stay ($x_{t+1} = x_t$).
  5. Iterate — repeat steps 2–4 many times to collect samples.
Metropolis-Hastings accept/reject step Propose a new state, compute the acceptance probability, draw a uniform random number, then accept the proposal or keep the current state, and repeat. Propose x′ from q(·|x_t) Compute α min(1, ratio) u ≤ α ? u ~ U(0,1) yes no Accept x_(t+1) = x′ Reject x_(t+1) = x_t next iteration
One MH step: a better-fitting proposal is always accepted (α = 1); a worse one is accepted only sometimes — which is what lets the chain explore.

Why the normalizing constant doesn’t matter. Because $\alpha$ depends on the ratio $\pi(x’)/\pi(x_t)$, any constant factor in $\pi$ cancels top and bottom. So you can plug in an unnormalized density (e.g. a Bayesian posterior you know only up to evidence) and MH still samples the correct distribution — a huge practical convenience.

Check your understanding

The proposed state $x'$ has higher target density than the current state. What does MH do?

Python implementation

Example 1: sampling from a Gaussian distribution

import numpy as np
import matplotlib.pyplot as plt

def target_distribution(x):
    return np.exp(-0.5 * x**2)  # Gaussian distribution (unnormalized)

def proposal_distribution(x):
    return np.random.normal(x, 1)  # Propose a new state from a normal distribution centered at x

def metropolis_hastings(initial_state, target_distribution, proposal_distribution, iterations):
    samples = [initial_state]
    current_state = initial_state

    for _ in range(iterations):
        proposed_state = proposal_distribution(current_state)
        acceptance_probability = min(1, target_distribution(proposed_state) / target_distribution(current_state))

        if np.random.rand() < acceptance_probability:
            current_state = proposed_state
        
        samples.append(current_state)
    
    return np.array(samples)

# Parameters
initial_state = 0
iterations = 10000

# Generate samples
samples = metropolis_hastings(initial_state, target_distribution, proposal_distribution, iterations)

# Plotting the results
plt.figure(figsize=(10, 6))
plt.hist(samples, bins=50, density=True, alpha=0.6, color='g', label='Samples')
x = np.linspace(-5, 5, 1000)
plt.plot(x, target_distribution(x), 'r', lw=2, label='Target Distribution')
plt.title('Metropolis-Hastings Sampling from a Gaussian Distribution')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.show()

Notice the acceptance line uses only target(proposed) / target(current) — no q terms. That’s because the proposal here is a Gaussian centered on the current state, which is symmetric: $q(x’ \mid x_t) = q(x_t \mid x’)$, so the $q$ factors cancel and the general MH ratio collapses to the simpler Metropolis ratio. It also uses the unnormalized Gaussian — and still recovers the right distribution, exactly as the ratio argument promised.

Example 1
Example 1

Example 2: sampling from a multimodal distribution

def target_distribution(x):
    return 0.3 * np.exp(-0.2 * (x - 3)**2) + 0.7 * np.exp(-0.2 * (x + 3)**2)

def proposal_distribution(x):
    return np.random.normal(x, 1)

# Parameters
initial_state = 0
iterations = 10000

# Generate samples
samples = metropolis_hastings(initial_state, target_distribution, proposal_distribution, iterations)

# Plotting the results
plt.figure(figsize=(10, 6))
plt.hist(samples, bins=50, density=True, alpha=0.6, color='g', label='Samples')
x = np.linspace(-10, 10, 1000)
plt.plot(x, target_distribution(x), 'r', lw=2, label='Target Distribution')
plt.title('Metropolis-Hastings Sampling from a Multimodal Distribution')
plt.xlabel('x')
plt.ylabel('Density')
plt.legend()
plt.show()
Example 2
Example 2

Watch the proposal step size. In this two-mode example the peaks sit at $x = \pm 3$. With a small proposal standard deviation, the chain can get stuck in one mode for a long time because crossing the low-density valley between peaks requires a run of unlikely accepted moves. Too large a step, and almost every proposal is rejected. Tuning that step size — the classic exploration-vs-acceptance tradeoff — is central to making MCMC work.

Recap

Without scrolling up — can you state the loop? Metropolis-Hastings:

  • Proposes a new state $x’$ from $q(\cdot \mid x_t)$,
  • accepts it with probability $\alpha = \min!\left(1, \frac{\pi(x’)\,q(x_t\mid x’)}{\pi(x_t)\,q(x’\mid x_t)}\right)$, else stays put,
  • repeats to build a Markov chain whose stationary distribution is the target $\pi$,
  • and works on unnormalized densities because the constant cancels in the ratio.

Experiment with different targets and proposal step sizes to feel how the chain explores — that intuition is what makes the difference in real MCMC applications.

References

  1. Equation of State Calculations by Fast Computing Machines — Metropolis, Rosenbluth, Rosenbluth, Teller & Teller, 1953, The Journal of Chemical Physics, 21(6), 1087.
  2. Monte Carlo Sampling Methods Using Markov Chains and Their Applications — Hastings, 1970, Biometrika, 57(1), 97–109.

Where to go next

Related posts here: Monte Carlo methods and earthquake location problem and Hypothesis test for the significance of linear trend using Monte Carlo simulations.

Disclaimer of liability

The information provided by the Earth Inversion is made available for educational purposes only.

Whilst we endeavor to keep the information up-to-date and correct. Earth Inversion makes no representations or warranties of any kind, express or implied about the completeness, accuracy, reliability, suitability or availability with respect to the website or the information, products, services or related graphics content on the website for any purpose.

UNDER NO CIRCUMSTANCE SHALL WE HAVE ANY LIABILITY TO YOU FOR ANY LOSS OR DAMAGE OF ANY KIND INCURRED AS A RESULT OF THE USE OF THE SITE OR RELIANCE ON ANY INFORMATION PROVIDED ON THE SITE. ANY RELIANCE YOU PLACED ON SUCH MATERIAL IS THEREFORE STRICTLY AT YOUR OWN RISK.


Leave a comment