EES 4891-06/5891-01

Homework #8: Monte-Carlo sampling

Due Tue., Mar 17

PDF version

Homework

Preliminary Information

This homework set gives you practice working with Monte Carlo sampling as a more sophisticated way of estimating Bayesian posterior probability distributions than the quadratic approximation that we have been using with the quap function. For Monte Carlo sampling, we use the ulam function, which we have read about in Chapters 9–11. For these homework exercises, you will need the cmdstanr package for R and the cmdstan tool installed. See the Tools page on the course web site for detailed instructions on installing cmdstanr and cmdstan, and please come to my office hours for help is you can’t get these tools to work.

Homework Exercises:

Self-study: Work these exercises, but do not turn them in.

  • Exercises 9E3–9E6
  • Exercises 11E1–11E3

Turn in: Work these exercises and turn them in.

  • Exercises 9M1–9M2
  • Exercises 11M3–11M4 and 11M7

Notes on Homework:

You can download a PDF copy of the exercises from chapters 9–11 of Statistical Rethinking from https://ees5891.jgilligan.org/files/homework_docs/McElreath-Ch-9-11-homework.pdf

Exercise 9M2

For exercise 9M2, the question is ambiguous, about whether you should use the original dexp(1) or the dunif(0,1) from 9M1 as the prior for sigma. Use dexp(1) and only change the prior for b from the m9.1 in the book.

Exercises 9M1–9M2

For comparing the results, you should both look at the precis of the models and also plot the densities of the posteriors.

To easily plot the densities of the posteriors for one model, do the following (assume the model from the book is m9.1, and the models from the exercises are ex9m1 and ex9m2).

This is equivalent to the code from the textbook for m9.1:

library(cmdstanr)
library(rethinking)
library(tidyverse)

data(rugged)
d <- rugged %>% mutate(log_gdp = log(rgdppc_2000))

dd <- d %>% filter(complete.cases(rgdppc_2000)) %>%
  mutate(
    log_gdp_std = log_gdp / mean(log_gdp),
    rugged_std = rugged / max(rugged),
    cid = ifelse(cont_africa == 1, 1, 2)
  )

dat_slim <- dd %>% select(log_gdp_std, rugged_std, cid) %>%
  mutate(cid = as.integer(cid))

m9.1 <- ulam(
  alist(
    log_gdp_std ~ dnorm(mu, sigma),
    mu <- a[cid] + b[cid] * (rugged_std - 0.215) ,
    a[cid] ~ dnorm(1, 0.1),
    b[cid] ~ dnorm(0, 0.3),
    sigma ~ dexp(1)
  ), data=dat_slim, chains = 4, cores = 4, cmdstan = TRUE)

This code will plot the posterior probability densities:

library(tidybayes)
library(tidybayes.rethinking)
library(bayesplot)

draws <- tidy_draws(m9.1)
mcmc_dens(draws)

If you have two models, foo and bar, you can compare the posteriors of a parameter like sigma using code like this, from the textbook:

post_foo <- extract.samples(foo)
post_bar <- extract.samples(bar)

dens(post_foo$sigma, lwd = 1)
dens(post_bar$sigma, lwd = 1, col = rangi2, add = TRUE)

Alternately, you can do the same thing using the tidyverse dialect:

foo_draws <- tidy_draws(foo) %>% mutate(model = "foo")
bar_draws <- tidy_draws(bar) %>% mutate(model = "bar")

bind_rows(foo_draws, bar_draws) %>% 
  ggplot(aes(x = sigma, color = model, fill = model)) + 
  geom_density(size = 1, alpha = 0.3)

The first two lines extract the Monte-Carlo samples of the posterior for each model, and add a label model to indicate which model they came from. Then I use ggplot to plot the posterior density of sigma, from each model, where the color of the line and the fill inside the density plot indicate which model is which.

The argument size = 1 tells R how thick to make the line, and alpha = 0.3 sets the transparency of the fills (1 = completely opaque and 0 = completely transparent)

If you want to make a single plot that shows comparisons of all the parameters for both models, the code gets a bit more complicated:

foo_draws <- tidy_draws(foo) %>%
  gather_draws(a[i], b[i], sigma) %>%
  mutate(model = "foo")
bar_draws <- tidy_draws(bar) %>%
  gather_draws(a[i], b[i], sigma) %>%
  mutate(model = "bar")

bind_rows(foo_draws, bar_draws) %>% 
  mutate(.variable = ifelse(is.na(i), .variable,
                            str_c(.variable, "[", i, "]"))) %>%
  ggplot(aes(x = .value, col = model, fill = model)) + 
  geom_density(alpha = 0.3) + labs(x = NULL, y = NULL) + 
  facet_wrap(~.variable, scales = "free")

Exercises 11E1–11E3

To check yourself, for 11E1 the log-odds is -0.619, for 11E2 the probability is 0.961, and for 11E3 the proportional change in the odds is 5.47

Exercise 11M7

For 11M7, I recommend using the following code to compare the precises of the two models (m11.4 is the ulam model on page 330, and m11.4q is the quap version model that you should write for this exercise). This code prints the two precises side-by-side with each row representing a different parameter.

pr <- precis( m11.4 , 2 )[,1:4]
prq <- precis( m11.4q , 2 )
round( cbind( pr , prq ) , 2 )

Then for parameters that are really different in the two models (differences of more than 0.1 in the means or the 5.5% or 94.5% levels), make a density plot comparing the posteriors of the two models. You can use the code from my notes (above) for exercises 9M1 and 9M2 to do this. Think about what important differences you see in the posteriors from the two models, and what I’ve been emphasizing in class about the differences between the quap and Monte-Carlo methods for estimating posteriors.

You can also use the code, above, in the notes for 9M1 and 9M2, to make a single plot comparing the posteriors of all the parameters for both models. Then you can make a single comparison plot for any of the parameters that show significant differences between the quap and ulam models.

At the minimum, be sure to look carefully at a[2].