{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "20a03edd",
   "metadata": {},
   "source": [
    "# Rock Fries Your Brains\n",
    "\n",
    "To read more about the dataset and see an explorative analysis have a look at the Rmarkdown file `rock-fries-your-brains.Rmd` in the exercises for the first lecture.\n",
    "\n",
    "We load the needed packages for this exercise:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0ef0862",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31213c02",
   "metadata": {},
   "source": [
    "## Load data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289144a6",
   "metadata": {},
   "source": [
    "In the data, missing values are coded as 999 (but this is handled by the command \n",
    "below that replace 999 by NA). Loading data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5e47318",
   "metadata": {},
   "outputs": [],
   "source": [
    "musik = pd.read_csv(\"https://asta.math.aau.dk/datasets?file=musik.txt\", sep=\"\\t\").replace(999, np.nan)\n",
    "musik.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d070972",
   "metadata": {},
   "source": [
    "## Mozart versus control group\n",
    "\n",
    "We will compare the control group with the Mozart group, i.e. we leave\n",
    "out the Anthrax group (group 3)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e9429c40",
   "metadata": {},
   "source": [
    "### Median time week 1\n",
    "\n",
    "We will compare the median for week1, so we only extract this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "49c2b1d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "control = musik.loc[musik[\"group\"] == 1, \"median1\"].dropna()\n",
    "mozart  = musik.loc[musik[\"group\"] == 2, \"median1\"].dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbd7544a",
   "metadata": {},
   "source": [
    "Now compare the median for week 1 in the control and Mozart groups.\n",
    "I.e. use a $t$-test to investigate whether the mean response is different for the two groups. One way of doing this is to use ´stats.ttest_ind´, so modify the following code to do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a76ecc7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# t_stat, p_val = stats.ttest_ind(..., ..., equal_var=False)\n",
    "# print(\"t =\", t_stat, \", p =\", p_val)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "750a79c2",
   "metadata": {},
   "source": [
    "Supplement this analysis with a boxplot, where you modify the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8b22302",
   "metadata": {},
   "outputs": [],
   "source": [
    "#plt.boxplot([..., ...], labels=[\"...\", \"...\"])\n",
    "#plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e42e38f",
   "metadata": {},
   "source": [
    "### Median time week 4\n",
    "\n",
    "Make a similar analysis for `median4`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf9172cb",
   "metadata": {},
   "source": [
    "## Paired test for control group\n",
    "\n",
    "We now only consider the control group and the response `median` for this group.\n",
    "It is measured in week1 (`median1`) and week4 (`median4`) and we make a simple\n",
    "dataset where we stack the median values from week 1 and week 4 on top of each other. The column `values` contains the values (median time) and the column `ind` is an indicator of week 1 or 4 (it is always good to have a look at the first few lines of the dataset with the function `head`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "704baa04",
   "metadata": {},
   "outputs": [],
   "source": [
    "control14 = musik[musik[\"group\"] == 1]\n",
    "median14 = control14[[\"median1\", \"median4\"]].dropna()\n",
    "median14_stacked = median14.melt(var_name=\"variable\", value_name=\"value\")\n",
    "print(median14_stacked.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16265905",
   "metadata": {},
   "source": [
    "The following commands give a comparative analysis of the median from week1 to week4. Go through the\n",
    "output and explain what you see in the descriptive comparison (summary statistics and boxplots) as well\n",
    "as the inferential comparison (t-test)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a3446b52",
   "metadata": {},
   "outputs": [],
   "source": [
    "summary_stats = median14_stacked.groupby(\"variable\")[\"value\"].describe()\n",
    "print(summary_stats)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f34a684",
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.boxplot(x=\"variable\", y=\"value\", data=median14_stacked)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3a4f4b3",
   "metadata": {},
   "outputs": [],
   "source": [
    "t_stat, p_val = stats.ttest_rel(median14[\"median1\"], median14[\"median4\"])\n",
    "print(\"t =\", t_stat, \", p =\", p_val)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
