The ASTA team
Crime
which is the crime rateEducation
which is proportion of the population with high school examUrbanisation
which is proportion of the population living in urban areas## Crime Education Urbanisation
## 1 104 82.7 73.2
## 2 20 64.1 21.5
## 3 64 74.7 85.0
Education
(x1) and Urbanisation
(x2) were correlated with Crime
(y).##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Intercept
, corresponding to the mean response, when all predictors are equal to zero.##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Residual standard error
in R) with df=67−3=64 degrees of freedom.Std. Error
) se=0.4725 with corresponding observed t-value (t value
) tobs=−0.58340.4725=−1.235.Pr(>|t|)
) is 22%. That means that we can exclude Education
as a predictor.##
## Call:
## lm(formula = Crime ~ Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.766 -16.541 -4.741 16.521 49.632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.54125 4.53930 5.406 9.85e-07 ***
## Urbanisation 0.56220 0.07573 7.424 3.08e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.9 on 65 degrees of freedom
## Multiple R-squared: 0.4588, Adjusted R-squared: 0.4505
## F-statistic: 55.11 on 1 and 65 DF, p-value: 3.084e-10
We can compare two models, where one is obtained from the other by setting m parameters to zero, by an F-test.
We can compare R2 for the two models via the following F-statistic: Fobs=(R22−R21)/(df1−df2)(1−R22)/df2
In R the calculations are done using anova
:
## Analysis of Variance Table
##
## Model 1: Crime ~ Urbanisation
## Model 2: Crime ~ Education + Urbanisation
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 65 28391
## 2 64 27730 1 660.61 1.5247 0.2214
Crime
and the prediction equation ˆy=59−0.58x1+0.68x2, where n=67 and R2=0.4714.## [1] 1.378612e-09
##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Education
and Urbanisation
is good for prediction? We investigate this using the interaction model E(y|x1,x2)=α+β1x1+β2x2+β3x1x2, where we have extended with a possible effect of the product x1x2.Crime
data set:##
## Call:
## lm(formula = Crime ~ Education * Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.181 -15.207 -6.457 14.559 49.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.31754 49.95871 0.387 0.700
## Education 0.03396 0.79381 0.043 0.966
## Urbanisation 1.51431 0.86809 1.744 0.086 .
## Education:Urbanisation -0.01205 0.01245 -0.968 0.337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.83 on 63 degrees of freedom
## Multiple R-squared: 0.4792, Adjusted R-squared: 0.4544
## F-statistic: 19.32 on 3 and 63 DF, p-value: 5.371e-09
Suppose we want to predict the response variable using a categorical predictor variable x with k categories.
We choose one group, say group k, as the reference category.
For the remaining groups 1,…,k−1, we define dummy variables zi={0,if x≠i,1,if x=i. for i=1,…,k−1.
We can use the variables z1,…,zk−1 in a multiple regression along with other predictor variables.
Consider the dataset mtcars
. We are interested in how engine type vs
(categorical) and weight of the car wt
(quantitative, x1) are associated with fuel consumption mpg
.
The variable vs
is already coded as a dummy variable z in R, taking the value 1 if the engine is v-shaped and 0 otherwise.
The multiple regression model becomes E(Y|x1,z)=α+β1x1+β2z.
So we get two different regression lines for the two groups.
gf_point
for plotting points and gf_lm
for adding a regression line).library(mosaic)
gf_point(mpg ~ wt, color = ~factor(vs), group=~factor(vs), data = mtcars) %>% gf_lm()
## Warning: Using the `size` aesthetic with geom_line was deprecated in ggplot2 3.4.0.
## ℹ Please use the `linewidth` aesthetic instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
##
## Call:
## lm(formula = mpg ~ wt + factor(vs), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7071 -2.4415 -0.3129 1.4319 6.0156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.0042 2.3554 14.012 1.92e-14 ***
## wt -4.4428 0.6134 -7.243 5.63e-08 ***
## factor(vs)1 3.1544 1.1907 2.649 0.0129 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.78 on 29 degrees of freedom
## Multiple R-squared: 0.801, Adjusted R-squared: 0.7873
## F-statistic: 58.36 on 2 and 29 DF, p-value: 6.818e-11
wt
is estimated to be ˆβ1=−4.44, with corresponding p-value 5.63⋅10−8, so the effect of wt
is significantly different from zero.
##
## Call:
## lm(formula = mpg ~ wt + factor(vs), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7071 -2.4415 -0.3129 1.4319 6.0156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.0042 2.3554 14.012 1.92e-14 ***
## wt -4.4428 0.6134 -7.243 5.63e-08 ***
## factor(vs)1 3.1544 1.1907 2.649 0.0129 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.78 on 29 degrees of freedom
## Multiple R-squared: 0.801, Adjusted R-squared: 0.7873
## F-statistic: 58.36 on 2 and 29 DF, p-value: 6.818e-11
Intercept
between the two groups, while β3 is the difference in slope
between the two groups.*
in the model formula we include interaction between vs
and wt
:##
## Call:
## lm(formula = mpg ~ wt * factor(vs), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9950 -1.7881 -0.3423 1.2935 5.2061
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.5314 2.6221 11.263 6.55e-12 ***
## wt -3.5013 0.6915 -5.063 2.33e-05 ***
## factor(vs)1 11.7667 3.7638 3.126 0.0041 **
## wt:factor(vs)1 -2.9097 1.2157 -2.393 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared: 0.8348, Adjusted R-squared: 0.8171
## F-statistic: 47.16 on 3 and 28 DF, p-value: 4.497e-11
##
## Call:
## lm(formula = mpg ~ wt * factor(vs), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9950 -1.7881 -0.3423 1.2935 5.2061
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.5314 2.6221 11.263 6.55e-12 ***
## wt -3.5013 0.6915 -5.063 2.33e-05 ***
## factor(vs)1 11.7667 3.7638 3.126 0.0041 **
## wt:factor(vs)1 -2.9097 1.2157 -2.393 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.578 on 28 degrees of freedom
## Multiple R-squared: 0.8348, Adjusted R-squared: 0.8171
## F-statistic: 47.16 on 3 and 28 DF, p-value: 4.497e-11
mtcars
example, namely the model with and without interaction via## Analysis of Variance Table
##
## Model 1: mpg ~ wt + factor(vs)
## Model 2: mpg ~ wt * factor(vs)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 224.09
## 2 28 186.03 1 38.062 5.7287 0.02363 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Space, Right Arrow or swipe left to move to next slide, click help below for more details