Example
model <- lm(weight ~ feed, data = chickwts)
summary(model)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
- We get information about contrasts and their significance:
Intercept
corresponding to casein
has weight
different from zero (p<2×10−16).
- Weight difference between
casein
and horsebean
is extremely significant (p=2×10−9).
- There is no significant weight difference between
casein
and sunflower
(p=81%).
Example
model <- lm(weight ~ feed, data = chickwts)
summary(model)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
- The last line gives us the value of Fobs=15.36 and the corresponding p-value (5.9×10−10). Clearly there is a significant difference between the types of
feed
.
Main effect model in R
- The main effects model is fitted by
MainEff <- lm(len ~ supp + dose, data = ToothGrowth)
summary(MainEff)
##
## Call:
## lm(formula = len ~ supp + dose, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.085 -2.751 -0.800 2.446 9.650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.4550 0.9883 12.603 < 2e-16 ***
## suppVC -3.7000 0.9883 -3.744 0.000429 ***
## doseME 9.1300 1.2104 7.543 4.38e-10 ***
## doseHI 15.4950 1.2104 12.802 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.828 on 56 degrees of freedom
## Multiple R-squared: 0.7623, Adjusted R-squared: 0.7496
## F-statistic: 59.88 on 3 and 56 DF, p-value: < 2.2e-16
- The model has 4 parameters.
- The F test at the end compares with the (null) model with only one overall mean parameter. Does it seem like
supp
and dose
has an additive effect?
Testing effect of supp
- Alternative model without effect of supp:
doseEff <- lm(len ~ dose, data = ToothGrowth)
- We can compare R2 to see if
doseEff
(Model 1) is sufficent to explain the data compared to MainEff
(Model 2). This is done by converting to F-statistic: Fobs=(R22−R21)/(df1−df2)(1−R22)/df2=(SSE1−SSE2)/(df1−df2)(SSE2)/df2.
- In R the calculations are done using
anova
:
## Analysis of Variance Table
##
## Model 1: len ~ dose
## Model 2: len ~ supp + dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1025.78
## 2 56 820.43 1 205.35 14.017 0.0004293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Testing effect of dose
- Alternative model without effect of dose:
suppEff <- lm(len ~ supp, data = ToothGrowth)
anova(suppEff, MainEff)
## Analysis of Variance Table
##
## Model 1: len ~ supp
## Model 2: len ~ supp + dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 58 3246.9
## 2 56 820.4 2 2426.4 82.811 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Example
- We fit the interaction model by changing plus to multiply in the model expression from before:
Interaction <- lm(len ~ supp*dose, data = ToothGrowth)
- Now we can think of an experiment with 6 groups corresponding to each combination of the predictors.
- Looking at the group averages it looks like, the supplement types behave quite differently depending on
dose
:
mean(len ~ supp + dose, data = ToothGrowth)
## OJ.LO VC.LO OJ.ME VC.ME OJ.HI VC.HI
## 13.23 7.98 22.70 16.77 26.06 26.14
anova(MainEff, Interaction)
## Analysis of Variance Table
##
## Model 1: len ~ supp + dose
## Model 2: len ~ supp * dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 56 820.43
## 2 54 712.11 2 108.32 4.107 0.02186 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- With a p-value of 2.1860269% there is a significant interaction
supp:dose
, i.e. the lack of parallel curves in the interaction plot is significant.
##
## Call:
## lm(formula = len ~ supp * dose, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.20 -2.72 -0.27 2.65 8.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.230 1.148 11.521 3.60e-16 ***
## suppVC -5.250 1.624 -3.233 0.00209 **
## doseME 9.470 1.624 5.831 3.18e-07 ***
## doseHI 12.830 1.624 7.900 1.43e-10 ***
## suppVC:doseME -0.680 2.297 -0.296 0.76831
## suppVC:doseHI 5.330 2.297 2.321 0.02411 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
## F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
- The additional effect of both changing from
supp
OJ
to VC
and increasing dose
from LO
to HI
(β5=suppVC:doseHI
) is highly significant.