---
title: "Hypothesis test"
author: "The ASTA team"
output:
  slidy_presentation:
    fig_caption: no
    highlight: tango
    theme: cerulean
  pdf_document:
    fig_caption: no
    highlight: tango
    number_section: yes
    toc: yes
---

```{r, include = FALSE}
## Remember to add all packages used in the code below!
missing_pkgs <- setdiff(c("mosaic"), rownames(installed.packages()))
if(length(missing_pkgs)>0) install.packages(missing_pkgs)
```

# Statistical inference: Hypothesis and test
## Concept of hypothesis

* A **hypothesis** is a statement about a given
population. Usually it is stated as a population parameter having a given
value or being in a certain interval.
* Examples:
    * Quality control of products: The hypothesis is that the
        products e.g. have a certain weight, a given power consumption or a minimal durability.
    * Scientific hypothesis:  There is no dependence
        between a company's age and level of return.


## Significance test

* A significance test is used to investigate, whether data is
  contradicting the hypothesis or not.
* If the hypothesis says that a parameter has a certain value, then
  the test should tell whether the sample estimate is "far"
  away from this value.
* For example:
    * Waiting times in a queue. We sample $n$ customers and count
        how many that have been waiting more than 5 minutes. The company policy
        is that at most $10\%$ of the customers should wait more than 5
        minutes. In a sample of size $n=32$ we observe 4 with waiting time
        above 5 minutes, i.e. the estimated proportion is
        $\hat{\pi} = \frac{4}{32} = 12.5\%$. Is this "much more" than 
        (i.e. significantly different from) $10\%$?
    * The blood alcohol level of a student is measured 4 times with
        the values $0.504,0.500,0.512,0.524$, i.e. the estimated mean
        value is $\bar{y}=0.51$. Is this "much different" than a limit of $0.5$?


## Null and alternative hypothesis

* **The null hypothesis** - denoted $H_0$ - usually specifies
  that a population parameter has some given value. E.g. if $\mu$ is the mean 
  blood alcohol level we can state the null hypothesis 
    * $H_0 : \mu = 0.5$.
* The **alternative hypothesis** - denoted $H_a$ - usually
  specifies that the population parameter is contained in a given
  set of values different than the null hypothesis. E.g. if $\mu$ again is the 
  population mean of a blood alcohol level measurement, then
    * the null hypothesis is $H_0 : \mu = 0.5$
    * the alternative hypothesis is $H_a : \mu \neq 0.5$.


## Test statistic

* We consider a population parameter $\mu$ and write the null hypothesis
  $$
  H_0:\mu = \mu_0,
  $$
  where $\mu_0$ is a known number, e.g.\ $\mu_0 = 0.5$.
* Based on a sample we have an estimate $\hat{\mu}$.
* A **test statistic** $T$ will typically depend on $\hat{\mu}$ and
  $\mu_0$ (we may write this as $T(\hat{\mu}, \mu_0)$) and measures "how far from $\mu_0$ is $\hat{\mu}$?"
* Often we use $T(\hat{\mu},\mu_0)$ = "the number of standard deviations from $\hat{\mu}$
          to $\mu_0$".
* For example it would be very unlikely to be more than 3 standard
  deviations from $\mu_0$, i.e. in that case $\mu_0$ is probably not the
  correct value of the population parameter.


## $P$-value

* We consider
    * $H_0$:\ a null hypothesis.
    * $H_a$:\ an alternative hypothesis.
    * $T$:\ a test statistic, where the value calculated based on
          the current sample is denoted $t_{obs}$.
* To investigate the plausibility of $H_0$, we measure the evidence against $H_0$ 
  by the so-called $p$-value:
    * The $p$-value is the probability of observing a more extreme value of $T$ 
      (if we were to repeat the experiment) than $t_{obs}$ *under the assumption that
      $H_0$ is true*. 
    * "Extremity" is measured relative to the alternative hypothesis; a value is 
      considered extreme if it is "far from" $H_0$ and "closer to" $H_a$. 
    * If the $p$-value is small then there is a small probability of observing 
      $t_{obs}$ if $H_0$ is true, and thus $H_0$ is not very probable for our sample
      and we put more support in $H_a$, so:
    
      $$
      \textbf{The smaller the $p$-value, the less we trust $H_0$.} 
      $$
* What is a small $p$-value? If it is below $5\%$ we say it is
      **significant** at the $5\%$ level.


## Significance level

* We consider
    * $H_0$: a null hypothesis.
    * $H_a$: an alternative hypothesis.
    * $T$: a test statistic, where the value calculated based on
          the current sample is denoted $t_{obs}$ and the corresponding
          $p$-value is $p_{obs}$.
* Small values of $p_{obs}$ are critical for $H_0$.
* In practice it can be necessary to decide whether or not we are
  going to reject $H_0$.
* The decision can be made if we previously have decided on a
  so-called **$\alpha$-level**, where 
    * $\alpha$ is a given percentage 
    * we reject $H_0$, if $p_\text{obs}$ is less than or equal to $\alpha$
    * $\alpha$ is called the **significance level** of the test
    * typical choices of $\alpha$ are $5\%$ or $1\%$.

## Significance test for mean

### Two-sided $t$-test for mean:

* We assume that data is a sample from $\texttt{norm}(\mu,\sigma)$.
* The estimates of the population parameters are $\hat{\mu}=\bar{y}$ and 
$\hat{\sigma}=s$ based on $n$ observations.
* Null hypothesis:\ $H_0:\ \mu = \mu_0$, where $\mu_0$ is a known value.
* **Two-sided alternative hypothesis**:\  $H_a:\ \mu \neq \mu_0$.
* Observed test statistic:\ $t_{obs} = \frac{\bar{y} - \mu_0}{se}$, where
  $se = \frac{s}{\sqrt{n}}$.
*  I.e.\ $t_{obs}$ measures, how many standard deviations (with $\pm$
  sign) the empirical mean lies away from $\mu_0$.
* If $H_0$ is true, then $t_{obs}$ is an observation from the
  $t$-distribution with $df = n - 1$.
* $P$-value: Values bigger than $|t_{obs}|$ or less than $-|t_{obs}|$ puts
  more support in $H_a$ than $H_0$.
* The $p$-value = 2 x "upper tail probability of
  $|t_{obs}|$". The probability is calculated in the $t$-distribution with $df$
  degrees of freedom.

----

### Example: Two-sided $t$-test
* Blood alcohol level measurements: $0.504, 0.500, 0.512, 0.524$.
* These are assumed to be a sample from a normal distribution.
* We calculate
    * $\bar{y} = 0.51$ and $s = 0.0106$
    * $se = \frac{s}{\sqrt{n}} = \frac{0.0106}{\sqrt{4}} = 0.0053$.
    * $H_0: \mu = 0.5$,\ i.e.\ $\mu_0 = 0.5$.
    * $t_{obs} = \frac{\bar{y}-\mu_0}{se} = \frac{0.51-0.5}{0.0053} = 1.89$.
* So we are almost 2 standard deviations from $0.5$.\ Is this extreme
  in a $t$-distribution with 3 degrees of freedom?

```{r message=FALSE}
library(mosaic)
1 - pdist("t", q = 1.89, df = 3)
```

* The $p$-value is 2$\cdot$ `r round(abs(pt(-1.89, df = 3)),3)`,\  i.e. more 
  than 15\%. On the basis of this we do not reject $H_0$.


## One-sided $t$-test for mean

The book also discusses one-sided $t$-tests for the mean, but we will not use those in the course.

## Agresti: Overview of $t$-test
```{r, fig.width = 10, echo = FALSE, fig.align = 'center'}
# was ![](https://asta.math.aau.dk/static-files/asta/img/t-testOversigt.jpg)
url <- "https://asta.math.aau.dk/static-files/asta/img/t-testOversigt.jpg"
z <- tempfile()
download.file(url, z, mode = "wb")
grid::grid.raster(jpeg::readJPEG(z))
invisible(file.remove(z))
```

## Significance test for proportion

* Consider a sample of size $n$, where we observe whether a given property is 
present or not.
* The relative frequency of the property in the population is $\pi$, which 
is estimated by $\hat{\pi}$.
* Null hypothesis:\ $H_0: \pi = \pi_0$, where $\pi_0$ is a known number.
* **Two-sided alternative** hypothesis:\ $H_a: \pi\neq\pi_0$.
* *If $H_0$ is true* the standard error for $\hat{\pi}$ is
  given by $se_0 = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}$.
* Observed test statistic: $z_{obs} = \frac{\hat{\pi}-\pi_0}{se_0}$
* I.e. $z_{obs}$ measures, how many standard deviations (with $\pm$ sign)
  there is from $\hat{\pi}$ to $\pi_0$.

----


### Approximate test

* If both $n\hat{\pi}$ and $n(1 - \hat{\pi})$ are larger than 15 we know from
  previously that $\hat{\pi}$ follows a normal distribution (approximately), i.e.
    * If $H_0$ is true, then $z_{obs}$ is an observation from the
    standard normal distribution.
* $P$-value for **two-sided** test: Values greater than $|z_{obs}|$
  or less than $-|z_{obs}|$ point more towards $H_a$ than $H_0$.
* The $p$-value=2 x "upper tail probability for
  $|z_{obs}|$". The probability is calculated in the standard normal distribution.

----

### Example: Approximate test
* We consider a study from Florida Poll 2006:
    * In connection with problems financing public service a random
      sample of 1200 individuals were asked whether they preferred less
      service or tax increases.
    * 52% preferred tax increases. Is this enough to say that the proportion is significantly different from fifty-fifty?
* Sample with $n = 1200$ observations and estimated proportion
  $\hat{\pi} = 0.52$. 
* Null hypothesis $H_0: \pi = 0.5$.
* Alternative hypothesis $H_a: \pi\neq 0.5$.
* Standard error
  $se_0 = \sqrt{\frac{\pi_0(1-\pi_0)}{n}} = \sqrt{\frac{0.5\times0.5}{1200}} = 0.0144$
* Observed test statistic
  $z_{obs} = \frac{\hat{\pi}-\pi_0}{se_0}=\frac{0.52-0.5}{0.0144}=1.39$
* "upper tail probability for 1.39" in the
  standard normal distribution is 0.0823, i.e. we have a
  $p$-value of 2$\cdot$ 0.0823$\approx$ 16%.
* Conclusion: There is not sufficient evidence to reject
  $H_0$, i.e. we do not reject that the preference in the population
  is fifty-fifty.
* Note, the above calculations can also be performed automatically in **R** by 
  (a little different results due to rounding errors in the manual calculation):
  
```{r}
count <- 1200 * 0.52 # number of individuals preferring tax increase
prop.test(x = count, n = 1200, correct = F)
```

----

### Binomial (exact) test

* Consider again a sample of size $n$, where we observe whether a given property is present or not.
* The relative frequency of the property in the population is $\pi$, which is 
estimated by $\hat{\pi}$.
* Let $y_+=n\hat{\pi}$ be the frequency (total count) of the property in the sample.
* It can be shown that $y_+$ follows the **binomial distribution**
  with size parameter $n$ and success probability $\pi$.
  We use $Bin(n,\pi)$ to denote this distribution.
* Null hypothesis:\ $H_0: \pi=\pi_0$, where $\pi_0$ is a known number.
* Alternative hypothesis:\ $H_a: \pi \neq \pi_0$, where $\pi_0$ is a known number.
* $P$-value for **two-sided** binomial test:
    * If $y_+\geq n\pi_0$:\ 2 x "upper tail probability for $y_+$"
        in the $Bin(n,\pi_0)$ distribution.
    * If $y_+< n\pi_0$:\ 2 x "lower tail probability for $y_+$" in
        the $Bin(n,\pi_0)$ distribution.

----

### Example: Binomial test

* Experiment with $n=30$, where we have $y_+=14$ successes.
* We want to test $H_0:\pi=0.3$ vs.\ $H_a:\pi\not=0.3$.
* Since $y_+>n\pi_0=9$ we use the upper tail probability
  corresponding to the sum of the height of the red lines to the right of 14 in
  the graph below. (Notice, the graph continues on the right hand side
  to $n=30$, but it has been cut off for illustrative purposes.)
* The upper tail probability from 14 and up (i.e. greater than 13) is:
```{r}
lower_tail <- pdist("binom", q = 13, size = 30, prob = 0.3)
1 - lower_tail
```
* The two-sided $p$-value is then 2 x `r round(1-lower_tail, 2)` = `r 2 * round(1-lower_tail, 2)`.

----

### Binomial test in **R**
* We return to the Chile data, where we again look at the variable `sex`. 
* Let us test whether the proportion of females is different from 50 %, i.e., we
look at  $H_0:\ \pi=0.5$ and $H_a:\ \pi \neq 0.5$, where $\pi$ is the unknown population proportion of females.
```{r}
Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
binom.test( ~ sex, data = Chile, p = 0.5, conf.level = 0.95)
```
* The $p$-value for the binomial exact test is $27\%$, so there is no significant
difference between the proportion of males and females. 
* The approximate test has a $p$-value of $26\%$, which can be calculated by the command
```{r eval = F}
prop.test( ~ sex, data = Chile, p = 0.5, conf.level = 0.95, correct = FALSE)
```
(note the additional argument `correct = FALSE`).

## Agresti: Overview of tests for mean and proportion
```{r, fig.width = 10, echo = FALSE, fig.align = 'center'}
# was ![](https://asta.math.aau.dk/static-files/asta/img/AGRoversigt.jpg)
url <- "https://asta.math.aau.dk/static-files/asta/img/AGRoversigt.jpg"
z <- tempfile()
download.file(url, z, mode = "wb")
grid::grid.raster(jpeg::readJPEG(z))
invisible(file.remove(z))
```