---
output:
  pdf_document: default
  html_document: default
---

# Exam exercise: Logistic regression analysis of Berkely admission data

You may use the combined lecture notes for this module available at
<https://asta.math.aau.dk> to guide you to the relevant methods and R commands
for this exam.

The following table shows the total number of admitted and rejected
applicants to the six largest departments at University of Berkeley in 1973.

|       | Admitted| Rejected|
|:------|--------:|--------:|
|Male   |     1198|     1493|
|Female |      557|     1278|

Use a $\chi^2$-test to check whether the admission statistics for
Berkeley show any sign of gender discrimination. To enter the table
in R you can do:

```{r}
admit <- matrix(c(1198, 557, 1493, 1278), 2, 2)
rownames(admit) <- c("Male", "Female")
colnames(admit) <- c("Admitted", "Rejected")
admit <- as.table(admit)
```

Your analysis should as a minimum
contain **arguments** that support: 

- Statement of hypotheses
- Calculation of expected frequencies
- Calculation of test statistic
- Calculation and interpretation of p-value.

A more detailed data set with the admissions for each department is 
available on the course web page. The variables are:


- `Gender` (male/female)
- `Dept` (department A, B, C, D, E, F)
- `Admit` (frequency of admitted for each combination)
- `Reject` (frequency of rejected for each combination)

Load the data into RStudio:
```{r }
admission <-
    read.table("http://asta.math.aau.dk/dan/static/datasets?file=admission.dat",
               header=TRUE)
admission
```
In order to do logistic regression for this kind of data, the response is the columns `Admit` and `Reject` (which
means that we model the probability of admit) :

```{r }
m0 <- glm(cbind(Admit, Reject) ~ Gender + Dept, family = binomial, data = admission)
```
The glm-object `m0` is a logistic model with main effects of `Gender`
and `Department`.

- Investigate whether there is any effect of these predictors.

As a hint you might look at section 9.3 in the combined lecture notes.

```{r }
summary(m0)
```

Looking at the summary of `m0`:

- Is there a significant gender difference?
- What is the interpretation of the numbers in the `DeptB`-row?

We add the standardized residuals to `admission`:

```{r }
admission$stdRes <- round(rstandard(m0),2)
admission
```
- Looking at the standardized residuals, which department deviates
  heavily from the model?
- What gender is discrimated in this department?
  
Next you should fit the model with the interaction `Gender*Dept` and
use `anova` to compare this to `m0`.

- Explain what interaction means in the current context.
- Is there a significant interaction?
- In the light of your analysis, explain the reason for your
  answer to the previous question.
