The ASTA team
Table 9.15
in Agresti. The data are measurements in the 67 counties of Florida.Crime
which is the crime rateEducation
which is proportion of the population with high school examUrbanisation
which is proportion of the population living in urban areasFL <- read.delim("https://asta.math.aau.dk/datasets?file=fl-crime.txt")
head(FL, n = 3)
## Crime Education Urbanisation
## 1 104 82.7 73.2
## 2 20 64.1 21.5
## 3 64 74.7 85.0
library(mosaic)
splom(FL) # Scatter PLOt Matrix
Crime
and Education
Crime
and Urbanisation
cor(FL)
## Crime Education Urbanisation
## Crime 1.0000000 0.4669119 0.6773678
## Education 0.4669119 1.0000000 0.7907190
## Urbanisation 0.6773678 0.7907190 1.0000000
cor.test(~ Crime + Education, data = FL)
##
## Pearson's product-moment correlation
##
## data: Crime and Education
## t = 4.2569, df = 65, p-value = 6.806e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2553414 0.6358104
## sample estimates:
## cor
## 0.4669119
Education
(x1) and Urbanisation
(x2) are pretty good predictors for Crime
(y).model <- lm(Crime ~ Education + Urbanisation, data = FL)
summary(model)
##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Education
Education
is a good predictor for Crime
(with positive correlation).Urbanisation
, then Education
has a negative effect on Crime
(but not significant).Urbanisation
has positive effect on both Education
and Crime
.urbanisation
there is a (non-significant) negative association between Education
and Crime
.Education
is a good predictor for Crime
. If Education
has a large value, then this indicates a large value of Urbanisation
and thereby a large value of Crime
.Intercept
, corresponding to the mean response, when all predictors are equal to zero.gf_point(predict(model) ~ FL$Crime) %>%
gf_lm() %>%
gf_labs(title = paste("Correlation between predicted and observed y ( r =", round(sqrt(summary(model)$r.squared),2), ")"),
x = "Crime",
y = expression(hat(y)))
summary(model)
##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Residual standard error
in R) with df=67−3=64 degrees of freedom.Std. Error
) se=0.4725 with corresponding t-score (t value
) tobs=−0.58340.4725=−1.235.Pr(>|t|)
) is 22%. That means that we should exclude Education
as a predictor.model2 <- lm(Crime ~ Urbanisation, data = FL)
summary(model2)
##
## Call:
## lm(formula = Crime ~ Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.766 -16.541 -4.741 16.521 49.632
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.54125 4.53930 5.406 9.85e-07 ***
## Urbanisation 0.56220 0.07573 7.424 3.08e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.9 on 65 degrees of freedom
## Multiple R-squared: 0.4588, Adjusted R-squared: 0.4505
## F-statistic: 55.11 on 1 and 65 DF, p-value: 3.084e-10
Crime
and the prediction equation ˆy=59−0.58x1+0.68x2, where n=67 and R2=0.4714. We have
1 - pdist("f", 28.54, df1=2, df2=64)
## [1] 1.378612e-09
summary(model)
##
## Call:
## lm(formula = Crime ~ Education + Urbanisation, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.693 -15.742 -6.226 15.812 50.678
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.1181 28.3653 2.084 0.0411 *
## Education -0.5834 0.4725 -1.235 0.2214
## Urbanisation 0.6825 0.1232 5.539 6.11e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.82 on 64 degrees of freedom
## Multiple R-squared: 0.4714, Adjusted R-squared: 0.4549
## F-statistic: 28.54 on 2 and 64 DF, p-value: 1.379e-09
Education
and Urbanisation
is good for prediction? We want to investigate this using the model E(y|x1,x2)=α+β1x1+β2x2+β3x1x2, where we have extended with a possible effect of the product x1x2:model3 <- lm(Crime ~ Urbanisation * Education, data = FL)
summary(model3)
##
## Call:
## lm(formula = Crime ~ Urbanisation * Education, data = FL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.181 -15.207 -6.457 14.559 49.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.31754 49.95871 0.387 0.700
## Urbanisation 1.51431 0.86809 1.744 0.086 .
## Education 0.03396 0.79381 0.043 0.966
## Urbanisation:Education -0.01205 0.01245 -0.968 0.337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.83 on 63 degrees of freedom
## Multiple R-squared: 0.4792, Adjusted R-squared: 0.4544
## F-statistic: 19.32 on 3 and 63 DF, p-value: 5.371e-09
Space, Right Arrow or swipe left to move to next slide, click help below for more details