# Rock Fries Your Brains

To read more about the dataset and see an explorative analysis have a look at the Rmarkdown file `rock-fries-your-brains.Rmd` in the exercises for the first lecture.

We load the needed packages for this exercise:

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

## Load data

In the data, missing values are coded as 999 (but this is handled by the command 
below that replace 999 by NA). Loading data:

In [None]:
musik = pd.read_csv("https://asta.math.aau.dk/datasets?file=musik.txt", sep="\t").replace(999, np.nan)
musik.head()

## Mozart versus control group

We will compare the control group with the Mozart group, i.e. we leave
out the Anthrax group (group 3).

### Median time week 1

We will compare the median for week1, so we only extract this.

In [None]:
control = musik.loc[musik["group"] == 1, "median1"].dropna()
mozart  = musik.loc[musik["group"] == 2, "median1"].dropna()

Now compare the median for week 1 in the control and Mozart groups.
I.e. use a $t$-test to investigate whether the mean response is different for the two groups. One way of doing this is to use ´stats.ttest_ind´, so modify the following code to do this:

In [None]:
# t_stat, p_val = stats.ttest_ind(..., ..., equal_var=False)
# print("t =", t_stat, ", p =", p_val)

Supplement this analysis with a boxplot, where you modify the following code:

In [None]:
#plt.boxplot([..., ...], labels=["...", "..."])
#plt.show()

### Median time week 4

Make a similar analysis for `median4`

## Paired test for control group

We now only consider the control group and the response `median` for this group.
It is measured in week1 (`median1`) and week4 (`median4`) and we make a simple
dataset where we stack the median values from week 1 and week 4 on top of each other. The column `values` contains the values (median time) and the column `ind` is an indicator of week 1 or 4 (it is always good to have a look at the first few lines of the dataset with the function `head`):

In [None]:
control14 = musik[musik["group"] == 1]
median14 = control14[["median1", "median4"]].dropna()
median14_stacked = median14.melt(var_name="variable", value_name="value")
print(median14_stacked.head())

The following commands give a comparative analysis of the median from week1 to week4. Go through the
output and explain what you see in the descriptive comparison (summary statistics and boxplots) as well
as the inferential comparison (t-test).

In [None]:
summary_stats = median14_stacked.groupby("variable")["value"].describe()
print(summary_stats)

In [None]:
sns.boxplot(x="variable", y="value", data=median14_stacked)
plt.show()

In [None]:
t_stat, p_val = stats.ttest_rel(median14["median1"], median14["median4"])
print("t =", t_stat, ", p =", p_val)