---
title: "Solutions to exercises"
output:
  html_document: default
  pdf_document: default
---

```{r, setup, message=FALSE}
library(mosaic)
```

## Agresti 13.1

(a) The mean for white people (z=1) is 
$$
E(y|z=1)=11+2*1=13.
$$
Otherwise (z=0), the mean is 
$$
E(y|z=0)=11+2*0=11.
$$
(b) We plot the regression lines for the association between education and father's education for the two ` race` groups:
```{r, echo=FALSE}
gf_fun(3+0.8*x ~ x, xlim = c(0,20)) %>% gf_fun(3-0.6+0.8*x ~ x, color = "red")
```
(c) Fixing father's education to be $x$, the expected education is $3+0.8*x-0.6$ for whites and $3+0.8*x$ for non-whites. That is, the difference is $-0.6$. 
For instance for $x=12$, the expected education is  
```{r}
3+0.8*12-0.6
```
for whites and 
```{r}
3+0.8*12
```
for others, so the difference is $-0.6$.  

## Agresti exercise 13.5

(a) We get the prediction equation: 
$$
\hat{y}=8.3 + 9.8 \cdot f -5.3 \cdot s +7 \cdot m_1 + 2 \cdot m_2+1.2 \cdot m_3+0.501 \cdot x. 
$$
(b) The predicted alcohol consumption for divorced males whose father died in the past three years and with alcohol consumption three years previously equal to
i) 0 drinks:
```{r}
8.3 + 9.8  + 7
```

ii) 10 drinks:
```{r}
8.3 + 9.8  + 7 + 0.501*10
```

## Agresti exercise 13.7

Import data (this data set includes the variable `new`):
```{r}
HousePriceFull <- read.delim("https://asta.math.aau.dk/datasets?file=HousePriceFull.txt")
```

First interpret the following plots:
```{r}
gf_boxplot(Price ~ New, data = HousePriceFull)
gf_boxplot(Size ~ New, data = HousePriceFull)
gf_point(Price ~ Size, color = ~New, data = HousePriceFull)
```

- The house price seems to increase with size and new houses seem to be both bigger and more expensive.

Fit the linear model corresponding to Table 13.17:
```{r}
model <- lm( Price ~ Size + New, data = HousePriceFull )
summary(model)
```

Write the prediction equation with appropiate notation:
$$
\hat y = -40230.867 + 116.132*size + 57736.283*z,
$$
where $z$ is the dummy variable for new.

Plot the two regression lines:
```{r}
plotModel( model )
```


## Agresti exercise 13.8

Make the relevant plot(s) using `gf_point`:

```{r}
gf_point(Price ~ Size,  color = ~New, data = HousePriceFull) %>% gf_lm()
```

Fit the linear model corresponding to Table 13.18 in Agresti:
```{r}
model1 <- lm(Price ~ Size*New, data = HousePriceFull )
summary(model1)
```

Write the prediction equations for old and new houses:
$$
\begin{aligned}
\hat y_{old} &{}= -22227.808 + 104.438*size\\
\hat y_{new} &{}= (-22227.808 -78527.502) + (104.438 + 61.916)*size \\
&{}= -100755.3 + 166.354*size \\
\end{aligned}
$$

Is the interaction significant?

- Vi apply the `anova` function to the models with and without interaction:
```{r}
anova(model, model1)
```
This shows that the interaction is significant with a p-value of 0.005272. Alternatively, the test for interaction could be found in the summary of `model1` in the `Size:Newyes` line. This only works when the categorical variable has two levels, because in this case the model with interaction only contains one extra parameter.

## Agresti exercise 13.20

(a) The least permissive people seem to be older (because the slope for age is negative) white  (because the parameter corresponding to race is positive and white is the reference group, white are least permissive)  females (because the parameter corresponding to sex is negative and male is the reference group) with a low level of education (slope of education is positive) coming from the south (difference is positive, south is reference) who are fundamentalist Protestants (has the higest negative difference to reference group), frequently attend church (slope is negative), and do not tolerate freedom of speech (slope is negative).

(b) Similarly, the most permissive people seem to be younger black males  with a high level of education, coming from the "non-south", who are Jewish, rarely go to church, and tolerate freedom of speech.