Multiple linear regression

The ASTA team

Example from last lecture

Crime data set

FL <- read.delim("https://asta.math.aau.dk/datasets?file=fl-crime.txt")
head(FL, n = 3)
##   Crime Education Urbanisation
## 1   104      82.7         73.2
## 2    20      64.1         21.5
## 3    64      74.7         85.0

Multiple regression model for crime data

model <- lm(Crime ~ Education + Urbanisation, data = FL)
summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

The general model

Regression model

Interpretation of parameters

Estimation

Estimation of model parameters

Estimation of error variance

Multiple R-squared

Multiple \(R^2\)

Example

summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

Example

model2 <- lm(Crime ~ Urbanisation, data = FL)
summary(model2)
## 
## Call:
## lm(formula = Crime ~ Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.766 -16.541  -4.741  16.521  49.632 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.54125    4.53930   5.406 9.85e-07 ***
## Urbanisation  0.56220    0.07573   7.424 3.08e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.9 on 65 degrees of freedom
## Multiple R-squared:  0.4588, Adjusted R-squared:  0.4505 
## F-statistic: 55.11 on 1 and 65 DF,  p-value: 3.084e-10

F-test for comparing two models

anova(model2, model)
## Analysis of Variance Table
## 
## Model 1: Crime ~ Urbanisation
## Model 2: Crime ~ Education + Urbanisation
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1     65 28391                           
## 2     64 27730  1    660.61 1.5247 0.2214

Overall F-test for effect of predictors

F-test

Example

1 - pdist("f", 28.54, df1=2, df2=64)

## [1] 1.378612e-09
summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

Interaction model

Interaction between effects of predictors

Example - interaction model

model3 <- lm(Crime ~ Education * Urbanisation, data = FL)
summary(model3)
## 
## Call:
## lm(formula = Crime ~ Education * Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.181 -15.207  -6.457  14.559  49.889 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            19.31754   49.95871   0.387    0.700  
## Education               0.03396    0.79381   0.043    0.966  
## Urbanisation            1.51431    0.86809   1.744    0.086 .
## Education:Urbanisation -0.01205    0.01245  -0.968    0.337  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.83 on 63 degrees of freedom
## Multiple R-squared:  0.4792, Adjusted R-squared:  0.4544 
## F-statistic: 19.32 on 3 and 63 DF,  p-value: 5.371e-09

Multiple linear regression with categorical predictors

Dummy variables

Example

Example

library(mosaic)
gf_point(mpg ~ wt, color = ~factor(vs), group=~factor(vs), data = mtcars) %>% gf_lm()
## Warning: Using the `size` aesthetic with geom_line was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Example

model1 <- lm(mpg ~ wt + factor(vs) , data = mtcars)
summary(model1)
## 
## Call:
## lm(formula = mpg ~ wt + factor(vs), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7071 -2.4415 -0.3129  1.4319  6.0156 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33.0042     2.3554  14.012 1.92e-14 ***
## wt           -4.4428     0.6134  -7.243 5.63e-08 ***
## factor(vs)1   3.1544     1.1907   2.649   0.0129 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.78 on 29 degrees of freedom
## Multiple R-squared:  0.801,  Adjusted R-squared:  0.7873 
## F-statistic: 58.36 on 2 and 29 DF,  p-value: 6.818e-11
plotModel(model1)

Example: Prediction equations

summary(model1)
## 
## Call:
## lm(formula = mpg ~ wt + factor(vs), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7071 -2.4415 -0.3129  1.4319  6.0156 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33.0042     2.3554  14.012 1.92e-14 ***
## wt           -4.4428     0.6134  -7.243 5.63e-08 ***
## factor(vs)1   3.1544     1.1907   2.649   0.0129 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.78 on 29 degrees of freedom
## Multiple R-squared:  0.801,  Adjusted R-squared:  0.7873 
## F-statistic: 58.36 on 2 and 29 DF,  p-value: 6.818e-11

Interaction model

Example: Prediction equations

model2 <- lm(mpg ~ wt * factor(vs), data = mtcars)
summary(model2)
## 
## Call:
## lm(formula = mpg ~ wt * factor(vs), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9950 -1.7881 -0.3423  1.2935  5.2061 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     29.5314     2.6221  11.263 6.55e-12 ***
## wt              -3.5013     0.6915  -5.063 2.33e-05 ***
## factor(vs)1     11.7667     3.7638   3.126   0.0041 ** 
## wt:factor(vs)1  -2.9097     1.2157  -2.393   0.0236 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared:  0.8348, Adjusted R-squared:  0.8171 
## F-statistic: 47.16 on 3 and 28 DF,  p-value: 4.497e-11

Example: Individual tests

summary(model2)
## 
## Call:
## lm(formula = mpg ~ wt * factor(vs), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9950 -1.7881 -0.3423  1.2935  5.2061 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     29.5314     2.6221  11.263 6.55e-12 ***
## wt              -3.5013     0.6915  -5.063 2.33e-05 ***
## factor(vs)1     11.7667     3.7638   3.126   0.0041 ** 
## wt:factor(vs)1  -2.9097     1.2157  -2.393   0.0236 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared:  0.8348, Adjusted R-squared:  0.8171 
## F-statistic: 47.16 on 3 and 28 DF,  p-value: 4.497e-11
plotModel(model2)

Hierarchy of models

F-test

anova(model1, model2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt + factor(vs)
## Model 2: mpg ~ wt * factor(vs)
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     29 224.09                              
## 2     28 186.03  1    38.062 5.7287 0.02363 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1