Applied statistics
Frontpage
Lecture plan
Vidcasts
Datasets
R install help
Log-linear models
Literature
[A] Section 15.6.
This lecture as: slideshow (html), Rmarkdown (Rmd), notes (pdf).
The entire module: notes (pdf).
Exercises
These exercises should be answered using the Rmarkdown file heart.Rmd
which is on the server or can be downloaded here.
- The data set concerns some risk factors for developing a certain type of heart disease.
- The data set consists of 6 variables measured on 1841 people:
- a: Smoking (No, Yes)
- b: Strenuous mental work (No, Yes)
- c: Strenuous physical work (No, Yes)
- d: Systolic blood pressure (<140, >140)
- e: Ratio of alpha/beta lipoproteins (<3, >3)
- f: Family anamnesis of heart disease (Negative, Positive)
First focus on the variables [adef].
- Make a contingency table (or several) of the four variables using e.g.
tally(~a+d+e+f, data = chdcoco)
. Try different orderings of the variables to get different tables.
- Make a new variable called
freq
with the value 1 for all people:
chdcoco$freq <- 1
- Use
aggregate
to get the counts when only [adef] are included as factors:
adef <- aggregate(freq ~ a+d+e+f, data=chdcoco, FUN = sum)
- Write down the formula for the logarithm of the expected frequency using the parameters of the saturated model for this dataset (
adef
).
- Show that 3-way and 4-way interactions are insignificant. If you are lazy and don't want to write up all second order interactions, then use the shorthand
twowaymodel <- glm(freq ~ .^2, data=adef, family = poisson)
to define the model with only two way interactions and compare with the saturated model
fullmodel <- glm(freq ~ a*d*e*f, data=adef, family = poisson)
using anova
.
- If possible reduce the model by successively leaving out second order interactions based on the output of
drop1
. If for example you want to leave out a:d
from twowaymodel
, then you can get the reduced model by
newmodel <- update(twowaymodel, .~.-a:d)
Interpret the final model (use the graphical representation).
- What is the expected frequency of the combination: a="Yes", d=<140, e=<3, f="Positive"?
Return to the original data set and repeat the exercises 1.4 and 1.5(leave out 3-6 way interaction), but with all the variables as a starting point, [abcdef].
Finish exercises from previous lectures.