EES 4891-06/5891-01

Homework #5: Multivariable models

Due Thu., Feb 5

PDF version

Homework

Preliminary Information

This homework set gives you practice working with multivariable regression models, in which you try to predict the value of a dependent variable based on multiple independent predictor variables.

Homework Exercises:

Self-study: Work these exercises, but do not turn them in.

  • Exercises 5E1–5E4

Turn in: Work these exercises and turn them in.

  • Exercises 5M3, 5M4, 5H3

Notes on Homework:

You can download a PDF copy of the exercises from chapter 5 of Statistical Rethinking from https://ees5891.jgilligan.org/files/homework_docs/McElreath-Ch-5-homework.pdf

Exercise 5E4 gets at a subtle point about independence of variables when you have indicators for categories. This connects to a subtle, but important point about identifiability in models. When you can infer the exact value of a variable from other variables, then including the exactly predictable variable in your models can create problems by making the models non-identifiable. A good example is if you have indicator variables male and female for biological sex (for simplicity, I am leaving out the possibility of intersex individuals). If you have a regression model \(y ~ \alpha + \beta_1 I_{\text{male}} + \beta_1 I_{\text{female}}\), then the model will predict the same result if you use parameters \(\alpha' = \alpha + \delta\), \(\beta'_1 = \beta_1 - \delta\), and \(\beta'_2 = \beta_2 - \delta\). If you omit \(I_{\text{male}}\) or \(I_{\text{female}}\) from your model (but not both), you will have a model that works just as well (because \(I_{\text{male}} = 1 - I_{\text{female}}\), so the model will have just as much information), but the model will now be completely identifiable because we can’t get equivalent results by changing \(\alpha\) and \(\beta\). This is why the kind of analysis in this exercise, to check whether a model is fully identifiable, is important.