---
title: "Solution to warpbreaks exercise"
output:
  pdf_document: default
  html_document: default
---

```{r, setup, include=FALSE}
require(mosaic)
```

## The number of breaks in yarn during weaving

The following dataset contains the results of an experiment where the number of yarn breaks during weaving are measured for two different types of wool at three different levels of string tension.

```{r}
data(warpbreaks)
head(warpbreaks)
```

Perform a one way analysis of variance (ANOVA) of the `warpbreaks` data with `breaks` as response and `tension` as factor/explanatory variable. Remember to first plot the data. 
```{r}
gf_boxplot(breaks~tension,data=warpbreaks)
```

Then make a model in `R` using `lm`. 
```{r}
model1<-lm(breaks~tension ,data=warpbreaks)
summary(model1)
```
Write down the prediction equation. 

- We take low tension to be the reference group and define dummy variables:
$$
\begin{aligned}
z_1&=\begin{cases}
1& \text{if medium tension,}\\
0& \text{otherwise.}
\end{cases}\\
z_2&=\begin{cases}
1& \text{if high tension,}\\
0& \text{otherwise.}
\end{cases}\\
\hat{y}&=\alpha + \beta_1z_1+\beta_2z_2\\
\hat{y}&= 36.39  -10.00*z_1-14.72* z_2
\end{aligned}
$$

Use the prediction equation to calculate the predicted mean number of breaks for each tension level. 
$$
\hat{y}=\begin{cases} 
36.39,& \text{for low tension,}\\
26.39,& \text{for medium tension,}\\
21.67,& \text{for high tension,}\\
\end{cases}
$$

Is the mean number of breaks the same for all tensions?

- The F-test at the bottom of the output provides a test for no effect of tension. This is rejected with $p=0.001753$. 

Next, perform a two way analysis of variance with both wool type and tension as factors and no interaction. Remember to make plot(s). 

- We plot the sample mean for all combinations of wool and tension: 
```{r}
gf_point(breaks ~ tension, color = ~ wool, data = warpbreaks) %>% gf_line(breaks ~ tension, group = ~wool, stat = "summary")
```

- The lines are not parallel, suggesting an interaction. The model without interaction is fitted:
```{r}
model2<-lm(breaks ~ tension + wool, data=warpbreaks)
summary(model2)
```


Write down the prediction equation. 

- Let
$$
w=\begin{cases}
1& \text{if wool has type B,}\\
0& \text{otherwise.}
\end{cases}
$$
Then
$$
\hat{y}= 39.3  -10.00*z_1-14.72* z_2 - 5.778*w
$$

Use the prediction equation to calculate the predicted mean number of breaks for each combination of wool and tension.

- wool A, low tension: 39.3

- wool A, med. tension: 29.3

- wool A, high tension: 24.6

- wool B, low tension: 33.5

- wool B, med. tension: 23.5

- wool B, high tension: 18.8


Investigate if there is an interaction between wool and tension.

- We fit the model with interaction and make an F-test:
```{r}
model3 <- lm(breaks ~ tension * wool, data = warpbreaks)
anova(model2, model3)
```
We find that there is a significant interaction (at the 0.05 level) with $p=0.021$.