---
title: "Likelihood and maximum likelihood estimation"
author: "The ASTA team"
output:
  slidy_presentation:
    fig_caption: no
    highlight: tango
    theme: cerulean
  pdf_document:
    fig_caption: no
    keep_tex: yes
    highlight: tango
    number_section: yes
    toc: yes

---

```{r, include = FALSE}
options(digits = 2)
## Remember to add all packages used in the code below!
missing_pkgs <- setdiff(c("mosaic"), rownames(installed.packages()))
if(length(missing_pkgs)>0) install.packages(missing_pkgs)
```

```{r, include=FALSE}
library(mosaic)
```

<!---
likelihood-begrebet og maksimum likelihood estimation

Definere funktioner i R
Plotte dem: plotFun / makeFun
curve
optim()
plotFun( x * sin(1/x) ~ x, xlim=c(-1,1) )
--->


# The probability density/mass function

$$
X \sim N(\mu, \sigma)
$$

Assumes $\mu$ and $\sigma$ are some fixed values:

$$
f(x) = f(X = x; \mu, \sigma) = {\frac {1}{\sigma {\sqrt {2\pi }}}} \exp \left [ {-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}} \right ]
$$

E.g. for $\mu = 0$ and $\sigma = 1$:

```{r}
plotFun(dnorm(x, mean = 0, sd = 1) ~ x, xlim = c(-3, 3))
```

# The likelihood function for a single observation

What if instead the data is fixed and we let $\mu$ and $\sigma$ vary? 

$$
L(\mu, \sigma) = L(\mu, \sigma; X = x) = f(X = x; \mu, \sigma)
$$

E.g. assume that $X=2$ was observed:

```{r}
plotFun(dnorm(2, mean = x, sd = 1) ~ x, xlim = c(-3, 3))
```

The value of $\mu$ that gives the highest probability (density) 
of observing $X=2$ is $\mu = 2$.

# The likelihood function for $n$ observations

Assume independence for $n$ observations, then

\begin{align}
  L(\mu, \sigma) 
  = L(\mu, \sigma; X_1, X_2, \ldots, X_n)
  = \prod_{i=1}^n L(\mu, \sigma; X_i) 
  = \prod_{i=1}^n L(\mu, \sigma; X_i) .
\end{align}

* Maximum likelihood estimation (MLE): 
    - Find the values of $\mu$ and $\sigma$ that gives the largest value of $L$ for fixed $X$'s.

* Would you rather differentiate products or sums?

----

Do you know a function that turns products into sums?

----

$$
\log (a \cdot b) = \log(a) + \log(b)
$$

----


Assume independence for $n$ observations, then

\begin{align}
L(\mu, \sigma) = L(\mu, \sigma; X_1, X_2, \ldots, X_n) &= \prod_{i=1}^n L(\mu, \sigma; X_i) 
                                       = \prod_{i=1}^n L(\mu, \sigma; X_i)  \\
l(\mu, \sigma) = \log L(\mu, \sigma)  &= \log L(\mu, \sigma; X_1, X_2, \ldots, X_n) \\
                                      &= \log \left ( \prod_{i=1}^n L(\mu, \sigma; X_i) \right ) \\
                                      &= \sum_{i=1}^n \log L(\mu, \sigma; X_i) \\
                                      &= \sum_{i=1}^n l(\mu, \sigma; X_i) .
\end{align}

**Note that $\log$ is a monotonic (increasing) function, so maximum of $l = \log L$ is the 
same as that of $L$.**

----

\begin{align}
L(\mu, \sigma) &= L(\mu, \sigma; X = x) = f(X = x; \mu, \sigma) \\
l(\mu, \sigma; X) &= \log L(\mu, \sigma; X) = \log f(X; \mu, \sigma)
\end{align}

```{r}
plotFun(dnorm(2, mean = x, sd = 1) ~ x, xlim = c(-3, 3), ylim = c(-2, 1));
plotFun(log(dnorm(2, mean = x, sd = 1)) ~ x, xlim = c(-3, 3), add = TRUE)
```


# Example

```{r}
trees <- read.delim("https://asta.math.aau.dk/datasets?file=trees.txt")
head(trees)
```

```{r}
histogram(trees$Height)
```

----

If Height is normally distributed, 
what are the parameters (mean and standard deviation)?

---- 

```{r}
sd(trees$Height)
```

```{r}
single_loglik <- function(mu) {
  sum(log(dnorm(trees$Height, mean = mu, sd = 6)))
}
single_loglik(65)
single_loglik(c(60, 65, 70))
loglik <- Vectorize(single_loglik)
loglik(c(60, 65, 70))
plotFun(loglik(x) ~ x, xlim = c(70, 80))
```

Maximum around `r round(mean(trees$Height))`.

----

```{r}
optimise(single_loglik, interval = c(0, 100))
optimise(single_loglik, interval = c(0, 100), maximum = TRUE)
```

----

Maximum of $l$ is minimum of $-l$.

```{r}
single_loglik <- function(mu) {
  -sum(log(dnorm(trees$Height, mean = mu, sd = 6)))
}
optimise(single_loglik, interval = c(0, 100))
```

----

Both parameters:

```{r, warning=TRUE}
single_loglik <- function(pars) {
  mu <- pars[1]
  sigma <- pars[2]
  -sum(log(dnorm(trees$Height, mean = mu, sd = sigma)))
}
optim(c(1, 50), single_loglik)
```


----

```{r, warning=TRUE}
single_loglik <- function(pars) {
  mu <- pars[1]
  sigma <- exp(pars[2])
  -sum(log(dnorm(trees$Height, mean = mu, sd = sigma)))
}
mle <- optim(c(1, 50), single_loglik)
mle
exp(mle$par[2])
```

```{r}
mean(trees$Height)
sd(trees$Height)
n <- length(trees$Height)
sd(trees$Height)
```