Multiple linear regression

The ASTA team

Multiple regression model

Multiple regression model

Example

FL <- read.delim("https://asta.math.aau.dk/datasets?file=fl-crime.txt")
head(FL, n = 3)
##   Crime Education Urbanisation
## 1   104      82.7         73.2
## 2    20      64.1         21.5
## 3    64      74.7         85.0
library(mosaic)
splom(FL) # Scatter PLOt Matrix

Correlations

cor(FL)
##                  Crime Education Urbanisation
## Crime        1.0000000 0.4669119    0.6773678
## Education    0.4669119 1.0000000    0.7907190
## Urbanisation 0.6773678 0.7907190    1.0000000
cor.test(~ Crime + Education, data = FL)
## 
##  Pearson's product-moment correlation
## 
## data:  Crime and Education
## t = 4.2569, df = 65, p-value = 6.806e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2553414 0.6358104
## sample estimates:
##       cor 
## 0.4669119

Several predictors

Example

model <- lm(Crime ~ Education + Urbanisation, data = FL)
summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

Simpsons paradox

The general model

Regression model

Interpretation of parameters

Estimation

Estimation of model

Multiple R-squared

Multiple \(R^2\)

gf_point(predict(model) ~ FL$Crime) %>% 
  gf_lm() %>%
  gf_labs(title = paste("Correlation between predicted and observed y ( r =", round(sqrt(summary(model)$r.squared),2), ")"),
          x = "Crime",
          y = expression(hat(y)))
## Warning: Using the `size` aesthetic with geom_line was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Example

summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

Example

model2 <- lm(Crime ~ Urbanisation, data = FL)
summary(model2)
## 
## Call:
## lm(formula = Crime ~ Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.766 -16.541  -4.741  16.521  49.632 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.54125    4.53930   5.406 9.85e-07 ***
## Urbanisation  0.56220    0.07573   7.424 3.08e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.9 on 65 degrees of freedom
## Multiple R-squared:  0.4588, Adjusted R-squared:  0.4505 
## F-statistic: 55.11 on 1 and 65 DF,  p-value: 3.084e-10

F-test for effect of predictors

F-test

Example

1 - pdist("f", 28.54, df1=2, df2=64)

## [1] 1.378612e-09
summary(model)
## 
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.693 -15.742  -6.226  15.812  50.678 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   59.1181    28.3653   2.084   0.0411 *  
## Education     -0.5834     0.4725  -1.235   0.2214    
## Urbanisation   0.6825     0.1232   5.539 6.11e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared:  0.4714, Adjusted R-squared:  0.4549 
## F-statistic: 28.54 on 2 and 64 DF,  p-value: 1.379e-09

Test for interaction

Interaction between effects of predictors

model3 <- lm(Crime ~ Education * Urbanisation, data = FL)
summary(model3)
## 
## Call:
## lm(formula = Crime ~ Education * Urbanisation, data = FL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.181 -15.207  -6.457  14.559  49.889 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            19.31754   49.95871   0.387    0.700  
## Education               0.03396    0.79381   0.043    0.966  
## Urbanisation            1.51431    0.86809   1.744    0.086 .
## Education:Urbanisation -0.01205    0.01245  -0.968    0.337  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.83 on 63 degrees of freedom
## Multiple R-squared:  0.4792, Adjusted R-squared:  0.4544 
## F-statistic: 19.32 on 3 and 63 DF,  p-value: 5.371e-09