{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e5faab7d",
   "metadata": {},
   "source": [
    "# Exam exercise: Houseprices\n",
    "\n",
    "Remember to load the relevant packages first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2c477942",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import statsmodels.api as sm"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c14f13dc",
   "metadata": {},
   "source": [
    "In this exercise you will study the data described in Agresti EXAMPLE 9.10. \n",
    "\n",
    "You are studying house sales in Gainesville, Florida, where among other things the data contain the selling price (`Price`), property taxes (`Taxes`) and house size (`Size`).\n",
    "\n",
    "Read in the data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c82649a2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Taxes</th>\n",
       "      <th>Price</th>\n",
       "      <th>Size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3104</td>\n",
       "      <td>279900</td>\n",
       "      <td>2048</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1173</td>\n",
       "      <td>146500</td>\n",
       "      <td>912</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3076</td>\n",
       "      <td>237700</td>\n",
       "      <td>1654</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1608</td>\n",
       "      <td>200000</td>\n",
       "      <td>2068</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1454</td>\n",
       "      <td>159900</td>\n",
       "      <td>1477</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Taxes   Price  Size\n",
       "0   3104  279900  2048\n",
       "1   1173  146500   912\n",
       "2   3076  237700  1654\n",
       "3   1608  200000  2068\n",
       "4   1454  159900  1477"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "HousePrices = pd.read_csv(\"https://asta.math.aau.dk/datasets?file=HousePrice.txt\", sep='\\s+')\n",
    "HousePrices.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "910f56d1",
   "metadata": {},
   "source": [
    "- Make a relevant plot of the variables and discuss how they are related."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "707fac93",
   "metadata": {},
   "source": [
    "- Explain the concept of correlation and determine whether there is significant positive correlation between `Taxes` and `Size`. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27b56fd0",
   "metadata": {},
   "source": [
    "- What are the parameters of the model and what is the interpretation of these parameters?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6085989",
   "metadata": {},
   "source": [
    "- What is the prediction equation?\n",
    "$$\n",
    "\\widehat y = \n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73a80e81",
   "metadata": {},
   "source": [
    "Make the multiple linear regression and explain the result.\n",
    "This explanation should as a minimum include\n",
    "\n",
    "- Calculation of `t value` and determination and interpretation of p-value.\n",
    "\n",
    "- Interpretation of `Multiple R-squared`.\n",
    "\n",
    "- How the table of output can be used to construct confidence intervals for parameters. This should be supplemented by actual calculation for the current data."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01bf26c3",
   "metadata": {},
   "source": [
    "Finally, you have to investigate whether or not there is an interaction between the effect of `Taxes` and the effect of `Size` as predictors of `Price`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
