EES 5891-03

Homework #4: Multivariable models

Due Tue., Sep 20

PDF version

Homework

Homework Exercises:

Self-study: Work these exercises, but do not turn them in.

  • Exercises 5E1–5E4

Turn in: Work these exercises and turn them in.

  • Exercises 5M3, 5M4, 5H3

Notes on Homework:

Exercise 5E4 gets at a subtle point about independence of variables when you have indicators for categories. This connects to a subtle, but important point about identifiability in models. When you can infer the exact value of a variable from other variables, then including the exactly predictable variable in your models can create problems by making the models non-identifiable. A good example is if you have indicator variables male and female for biological sex (for simplicity, I am leaving out the possibility of intersex individuals). If you have a regression model \(y ~ \alpha + \beta_1 I_{\text{male}} + \beta_1 I_{\text{female}}\), then the model will predict the same result if you use parameters \(\alpha' = \alpha + \delta\), \(\beta'_1 = \beta_1 - \delta\), and \(\beta'_2 = \beta_2 - \delta\). If you omit \(I_{\text{male}}\) or \(I_{\text{female}}\) from your model (but not both), you will have a model that works just as well (because \(I_{\text{male}} = 1 - I_{\text{female}}\), so the model will have just as much information), but the model will now be completely identifiable because we can’t get equivalent results by changing \(\alpha\) and \(\beta\). This is why the kind of analysis in this exercise, to check whether a model is fully identifiable, is important.