Example
model <- lm(weight ~ feed, data = chickwts)
summary(model)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
- We get information about contrasts and their significance:
Intercept
corresponding to casein
has weight
different from zero (\(p < 2\times 10^{-16}\)).
- Weight difference between
casein
and horsebean
is extremely significant (p=\(2\times 10^{-9}\)).
- There is no significant weight difference between
casein
and sunflower
(p=\(81\)%).
Example
model <- lm(weight ~ feed, data = chickwts)
summary(model)
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
- The last line gives us the value of \(F_{obs} = 15.36\) and the corresponding \(p\)-value (\(5.9 \times 10^{-10}\)). Clearly there is a significant difference between the types of
feed
.
Main effect model in R
- The main effects model is fitted by
MainEff <- lm(len ~ supp + dose, data = ToothGrowth)
summary(MainEff)
##
## Call:
## lm(formula = len ~ supp + dose, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.085 -2.751 -0.800 2.446 9.650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.4550 0.9883 12.603 < 2e-16 ***
## suppVC -3.7000 0.9883 -3.744 0.000429 ***
## doseME 9.1300 1.2104 7.543 4.38e-10 ***
## doseHI 15.4950 1.2104 12.802 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.828 on 56 degrees of freedom
## Multiple R-squared: 0.7623, Adjusted R-squared: 0.7496
## F-statistic: 59.88 on 3 and 56 DF, p-value: < 2.2e-16
- The model has 4 parameters.
- The \(F\) test at the end compares with the (null) model with only one overall mean parameter. Does it seem like
supp
and dose
has an additive effect?
Testing effect of supp
- Alternative model without effect of supp:
doseEff <- lm(len ~ dose, data = ToothGrowth)
- We can compare \(R^2\) to see if
doseEff
(Model 1) is sufficient to explain the data compared to MainEff
(Model 2). This is done by converting to \(F\)-statistic: \[
F_{obs} = \frac{(R_2^2 - R_1^2)/(df_1 - df_2)}{(1 - R_2^2)/df_2} = \frac{(SSE_1 - SSE_2)/(df_1 - df_2)}{(SSE_2)/df_2}.
\]
- In R the calculations are done using
anova
:
## Analysis of Variance Table
##
## Model 1: len ~ dose
## Model 2: len ~ supp + dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 57 1025.78
## 2 56 820.43 1 205.35 14.017 0.0004293 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Testing effect of dose
- Alternative model without effect of dose:
suppEff <- lm(len ~ supp, data = ToothGrowth)
anova(suppEff, MainEff)
## Analysis of Variance Table
##
## Model 1: len ~ supp
## Model 2: len ~ supp + dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 58 3246.9
## 2 56 820.4 2 2426.4 82.811 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Example
- We fit the interaction model by changing plus to multiply in the model expression from before:
Interaction <- lm(len ~ supp*dose, data = ToothGrowth)
- Now we can think of an experiment with 6 groups corresponding to each combination of the predictors.
- Looking at the group averages it looks like, the supplement types behave quite differently depending on
dose
:
mean(len ~ supp + dose, data = ToothGrowth)
## OJ.LO VC.LO OJ.ME VC.ME OJ.HI VC.HI
## 13.23 7.98 22.70 16.77 26.06 26.14
anova(MainEff, Interaction)
## Analysis of Variance Table
##
## Model 1: len ~ supp + dose
## Model 2: len ~ supp * dose
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 56 820.43
## 2 54 712.11 2 108.32 4.107 0.02186 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- With a p-value of 2.1860269% there is a significant interaction
supp:dose
, i.e. the lack of parallel curves in the interaction plot is significant.
##
## Call:
## lm(formula = len ~ supp * dose, data = ToothGrowth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.20 -2.72 -0.27 2.65 8.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.230 1.148 11.521 3.60e-16 ***
## suppVC -5.250 1.624 -3.233 0.00209 **
## doseME 9.470 1.624 5.831 3.18e-07 ***
## doseHI 12.830 1.624 7.900 1.43e-10 ***
## suppVC:doseME -0.680 2.297 -0.296 0.76831
## suppVC:doseHI 5.330 2.297 2.321 0.02411 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.631 on 54 degrees of freedom
## Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
## F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
- The additional effect of both changing from
supp
OJ
to VC
and increasing dose
from LO
to HI
(\(\beta_5\)=suppVC:doseHI
) is highly significant.