Applied statistics

Frontpage Lecture plan Vidcasts Datasets R install help

Log-linear models

Literature

[A] Section 15.6.

This lecture as: slideshow (html), Rmarkdown (Rmd), notes (pdf).

The entire module: notes (pdf).

Exercises

These exercises should be answered using the Rmarkdown file heart.Rmd which is on the server or can be downloaded here.

  1. The data set concerns some risk factors for developing a certain type of heart disease.
    1. The data set consists of 6 variables measured on 1841 people:
      • a: Smoking (No, Yes)
      • b: Strenuous mental work (No, Yes)
      • c: Strenuous physical work (No, Yes)
      • d: Systolic blood pressure (<140, >140)
      • e: Ratio of alpha/beta lipoproteins (<3, >3)
      • f: Family anamnesis of heart disease (Negative, Positive)
      First focus on the variables [adef].
    2. Make a contingency table (or several) of the four variables using e.g. tally(~a+d+e+f, data = chdcoco). Try different orderings of the variables to get different tables.
    3. Make a new variable called freq with the value 1 for all people:
      chdcoco$freq <- 1
    4. Use aggregate to get the counts when only [adef] are included as factors:
      adef <- aggregate(freq ~ a+d+e+f, data=chdcoco, FUN = sum)
    5. Write down the formula for the logarithm of the expected frequency using the parameters of the saturated model for this dataset (adef).
    6. Show that 3-way and 4-way interactions are insignificant. If you are lazy and don't want to write up all second order interactions, then use the shorthand
      twowaymodel <- glm(freq ~ .^2, data=adef, family = poisson)
      to define the model with only two way interactions and compare with the saturated model
      fullmodel <- glm(freq ~ a*d*e*f, data=adef, family = poisson)
      using anova.
    7. If possible reduce the model by successively leaving out second order interactions based on the output of drop1. If for example you want to leave out a:d from twowaymodel, then you can get the reduced model by
      newmodel <- update(twowaymodel, .~.-a:d)
      Interpret the final model (use the graphical representation).
    8. What is the expected frequency of the combination: a="Yes", d=<140, e=<3, f="Positive"?
  2. Return to the original data set and repeat the exercises 1.4 and 1.5(leave out 3-6 way interaction), but with all the variables as a starting point, [abcdef].

  3. Finish exercises from previous lectures.