{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "554b48c5",
   "metadata": {},
   "source": [
    "# Exam exercise: Reading scores and sample size calculation\n",
    "\n",
    "It is highly recommended that you answer the exam using Rmarkdown\n",
    "(you can simply use the exam Rmarkdown file as a starting point).\n",
    "\n",
    "Remember to load the needed packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c60e3c41",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b3b601b",
   "metadata": {},
   "source": [
    "## Part I: Directed reading activities\n",
    "\n",
    "An educator conducted an experiment to test whether new directed reading\n",
    "activities in the classroom will help elementary school pupils improve\n",
    "some aspects of their reading ability.\n",
    "\n",
    "She arranged for a third grade class of 21 students\n",
    "to follow these activities for an 8-week period. A control classroom of\n",
    "23 third graders followed the same curriculum without the activities. At\n",
    "the end of the 8 weeks, all students took a Degree of Reading Power\n",
    "(DRP) test, which measures the aspects of reading ability that the\n",
    "treatment is designed to improve.\n",
    "\n",
    "\n",
    "Read in the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05487d02",
   "metadata": {},
   "outputs": [],
   "source": [
    "reading = pd.read_csv(\"https://asta.math.aau.dk/datasets?file=reading.txt\", delimiter=\"\\t\")\n",
    "print(reading.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d50d7521",
   "metadata": {},
   "source": [
    "Use a boxplot to compare the of measurements of `Score` (the student's DRP score) \n",
    "for `Treated`(direct reading activities) and `Control` visually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5ae15c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete this line and write a command using sns.boxplot(...) and plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f87b27ee",
   "metadata": {},
   "source": [
    "Make a numerical summary of the measurements for `Treated` and `Control` separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b0773a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Delete this line and write a command using ... .describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "604da4c2",
   "metadata": {},
   "source": [
    "-   Write down a point estimate of the mean of the DRP score for students\n",
    "    following the new *directed reading activities* and explain how this\n",
    "    is calculated.\n",
    "\n",
    "-   Write down a point estimate of the standard deviation of the DRP score for\n",
    "    this group and explain how this is calculated.\n",
    "\n",
    "-   Write down a 95% confidence interval for the mean of the DRP score for this\n",
    "    group and explain how this is calculated."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0ceefd0",
   "metadata": {},
   "source": [
    "Use the command `stats.ttest_ind` or `pg.ttest` to compare the mean of the DRP score of the two groups (the second command using the pingouin package also includes confidence intervals)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "95588e72",
   "metadata": {},
   "outputs": [],
   "source": [
    "## Delete this line and write a command"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b2254c4",
   "metadata": {},
   "source": [
    "Go through the details of t-test. Your analysis must include an account of\n",
    "\n",
    "-   What the relevant null hypothesis and the corresponding alternative\n",
    "    hypothesis is.\n",
    "\n",
    "-   Choice and calculation of test statistic.\n",
    "\n",
    "-   Calculation of $p$-value and its interpretation in connection to a\n",
    "    conclusion of the analysis.\n",
    "\n",
    "-   Calculation and interpretation of a relevant confidence interval."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "82462de1",
   "metadata": {},
   "source": [
    "## Part II: Determining sample size\n",
    "\n",
    "_In this part there is no dataset to load into Python and analyze. You are only supposed to use Python as a calculator where you apply the relevant formulas (which you find towards the end of the lecture notes for Module 1)._\n",
    "\n",
    "A study is being planned to estimate the proportion of the Danish population who smokes regularly. How large a sample size is needed to obtain an estimate which is at most 0.05 away from the true proportion with a confidence of 0.90? A similar study from 2015 found the estimate of the proportion of smokers to be 22.5%."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
