{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7824055a",
   "metadata": {},
   "source": [
    "# Exam exercise: Vital capacity\n",
    "\n",
    "First needed packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcd5b49a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import statsmodels.formula.api as smf\n",
    "from statsmodels.stats.anova import anova_lm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "446f92ae",
   "metadata": {},
   "source": [
    "In this exercise you will study a dataset concerning vital capacity,\n",
    "which is the maximal amount of air that can be exhaled after a maximal inhalation.\n",
    "\n",
    "Read in the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0dc936f",
   "metadata": {},
   "outputs": [],
   "source": [
    "vitcap = pd.read_csv('https://asta.math.aau.dk/datasets?file=vitcap.txt', sep='\\t')\n",
    "vitcap = vitcap.rename(columns={'vital.capacity': 'vital_capacity'})  # Renaming column since . is misinterpreted by Python\n",
    "vitcap.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3585ca36",
   "metadata": {},
   "source": [
    "In the dataset, the variable `vital_capacity` has been measured\n",
    "on 84 workers in the cadmium industry.\n",
    "\n",
    "The next variable is the factor `exposure` with 3\n",
    "levels, indicating the level of cadmium exposure:\n",
    "\n",
    " -  A: None\n",
    " -  B: Less than 10 years\n",
    " -  C: More than 10 years\n",
    "\n",
    "The data set also contains dummy variables for the factor\n",
    "`exposure`:\n",
    "\n",
    " -  `z1=1` if `exposure=B` and 0 otherwise.\n",
    " -  `z2=1` if `exposure=C` and 0 otherwise.\n",
    "\n",
    "You will use these two variables later on."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dd3f58a",
   "metadata": {},
   "source": [
    "Make a model and carry out an analysis investigating the effect of the\n",
    "factor `exposure` on the response `vital_capacity`. In\n",
    "that connection you should calculate/interpret the F-test for no effect of `exposure`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84dbcefe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9655a7c",
   "metadata": {},
   "source": [
    "We expand the analysis to include the workers age -\n",
    "the variable `age` - as a predictor.\n",
    "\n",
    "Make a model and carry out an analysis investigating the effect of the\n",
    "predictors `exposure` and `age` on the response\n",
    "`vital_capacity`. In that connection you should:\n",
    "\n",
    "- display the summary of each model you fit and be able to interpret parameter\n",
    "  estimates, test statistics, p-values etc. in this output.\n",
    "- investigate whether there is interaction between the effects of\n",
    "  `exposure` and `age`\n",
    "- give a graphical interpretation of such an interaction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0f58681",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1988419",
   "metadata": {},
   "source": [
    "Consider the following two models, where we introduce the dummy variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a63986fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "model1 = smf.ols('vital_capacity ~ age * z2', data=vitcap).fit()\n",
    "model2 = smf.ols('vital_capacity ~ age * z1 + age * z2', data=vitcap).fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f5ad095",
   "metadata": {},
   "source": [
    "- Use an F-test to show that there is no significant difference\n",
    "  between `model1` and `model2`.\n",
    "- Give an interpretation of the theoretical difference between the two models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43f671df",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write code here"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
