---
title: "Solutions to exercises"
output:
  pdf_document: default
  html_document: default
---

```{r, setup, include=FALSE}
require(mosaic)
```

## Exercise 12.3

The response $y$ is "ideal number of kids". The explanatory variable is religion. Taking no religion to be the reference group, we introduce the dummy variables
$$
\begin{aligned}
z_1&=\begin{cases}
1& \text{if Christian,}\\
0& \text{otherwise.}
\end{cases}\\
z_2&=\begin{cases}
1& \text{if Muslim,}\\
0& \text{otherwise.}
\end{cases}\\
z_3&=\begin{cases}
1& \text{if Jewish,}\\
0& \text{otherwise.}
\end{cases}\\
z_4&=\begin{cases}
1& \text{if other religion,}\\
0& \text{otherwise.}
\end{cases}
\end{aligned}
$$
The prediction equation then becomes:
$$
E(y | z_1,\ldots,z_4) = \alpha + \beta_1z_1+ \beta_2z_2+ \beta_3z_3+ \beta_4z_4.
$$

## Exercise 12.5

(a) The response y is "number of good friends" and the explanatory variable is "how often..." with three categories.
i) We formulate the hypothesis that there is no association between the number of good friends and how often one goes to a bar or tavern.
ii) We find the F-statistic in Table 12.23: F=3.03.
iii) From Table 12.23, the p-value is 0.049. 
iv) Since the p-value is just below the significance level of 0.05, we reject the hypothesis and conclude that there is an association between number of friends and how often one goes to a bar.

(b) One of the assumptions of the one-way ANOVA model is that the standard deviation is the same in all three groups. Comparing the estimated sd's in Table 12.23, shows that this assumption may be violated. (Since number of friends is a count variable, there may also be problems with the normality assumption, but this is probably less important).

(c) 
Taking "never" to be the reference group, we define 
$$
\begin{aligned}
z_1&=\begin{cases}
1& \text{if very often,}\\
0& \text{otherwise.}
\end{cases}\\
z_2&=\begin{cases}
1& \text{if occasional,}\\
0& \text{otherwise.}
\end{cases}
\end{aligned}
$$

## Exercise 12.11

The response y is "hours a day watching TV". The explanatory variables are sex and race. The observed means for each combination of sex and race are:
```{r,echo=FALSE}
data.frame(female=c(2.66,3.48),male=c(2.62,3.14),row.names=c("white","black"))
```


The tests in Table 12.27 show a significant effect of race but not of sex. This is consistent with the table, which shows that there is a difference between the means for black and white within each gender group, but no difference between men and women within each race group.


## The number of breaks in yarn during weaving

The following dataset contains the results of an experiment where the number of yarn breaks during weaving are measured for two different types of wool at three different levels of string tension.

```{r}
data(warpbreaks)
head(warpbreaks)
```

Perform a one way analysis of variance (ANOVA) of the `warpbreaks` data with `breaks` as response and `tension` as factor/explanatory variable. Remember to first plot the data. 
```{r}
gf_boxplot(breaks~tension,data=warpbreaks)
```

Then make a model in `R` using `lm`. 
```{r}
model1<-lm(breaks~tension ,data=warpbreaks)
summary(model1)
```
Write down the prediction equation. 

- We take low tension to be the reference group and define dummy variables:
$$
\begin{aligned}
z_1&=\begin{cases}
1& \text{if medium tension,}\\
0& \text{otherwise.}
\end{cases}\\
z_2&=\begin{cases}
1& \text{if high tension,}\\
0& \text{otherwise.}
\end{cases}\\
\hat{y}&=\alpha + \beta_1z_1+\beta_2z_2\\
\hat{y}&= 36.39  -10.00*z_1-14.72* z_2
\end{aligned}
$$

Use the prediction equation to calculate the predicted mean number of breaks for each tension level. 
$$
\hat{y}=\begin{cases} 
36.39,& \text{for low tension,}\\
26.39,& \text{for medium tension,}\\
21.67,& \text{for high tension,}\\
\end{cases}
$$

Is the mean number of breaks the same for all tensions?

- The F-test at the bottom of the output provides a test for no effect of tension. This is rejected with $p=0.001753$. 

Next, perform a two way analysis of variance with both wool type and tension as factors and no interaction. Remember to make plot(s). 

- We plot the sample mean for all combinations of wool and tension: 
```{r}
with(warpbreaks, interaction.plot(tension, wool, breaks, col = 2:3))
```

- The lines are not parallel, suggesting an interaction. The model without interaction is fitted:
```{r}
model2<-lm(breaks~tension + wool ,data=warpbreaks)
summary(model2)
```


Write down the prediction equation. 

- Let
$$
w=\begin{cases}
1& \text{if wool has type B,}\\
0& \text{otherwise.}
\end{cases}
$$
Then
$$
\hat{y}= 39.3  -10.00*z_1-14.72* z_2 - 5.778*w
$$

Use the prediction equation to calculate the predicted mean number of breaks for each combination of wool and tension.

- wool A, low tension: 39.3

- wool A, med. tension: 29.3

- wool A, high tension: 24.6

- wool B, low tension: 33.5

- wool B, med. tension: 23.5

- wool B, high tension: 18.8


Investigate if there is an interaction between wool and tension.

- We fit the model with interaction and make an F-test:
```{r}
model3<-lm(breaks~tension * wool ,data=warpbreaks)
anova(model2,model3)
```
We find that there is a significant interaction (at the 0.05 level) with $p=0.021$.