Likelihood and maximum likelihood estimation

The ASTA team

Maximum likelihood estimation of a probability

Estimating a probability

The likelihood function

Likelihood function - example

The log-likelihood function

\[l(p) = \ln(L(p)) = x\ln(p) + (n-x)\ln(1-p).\]

Maximum likelihood estimation

Maximum likelihood for logistic regression

The logistic regression model

Maximum likelihood estimation for logistic regression

Logistic regression - example

library(ISLR)
x<-Default$income/10000 # Annual income in 10000 dollars
y<-as.numeric(Default$default=="Yes") # Loan default, 1 means "Yes"

Logistic regression - example continued

loglik <- function(theta) {
alpha=theta[1]
beta=theta[2]
px<-1/(1+exp(-alpha-beta*x))
logpy<-y*log(px) + (1-y)*log(1-px) 
sum(logpy)
}
loglik(c(2,2))
## [1] -84250
optim(c(2,2),loglik,control=list(fnscale=-1))
## $par
## [1] -3.099 -0.081
## 
## $value
## [1] -1458
## 
## $counts
## function gradient 
##       69       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL

Logistic regression - example continued

model<-glm(y~x,family="binomial")
summary(model)
## 
## Call:
## glm(formula = y ~ x, family = "binomial")
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.0941     0.1463  -21.16   <2e-16 ***
## x            -0.0835     0.0421   -1.99    0.047 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2920.6  on 9999  degrees of freedom
## Residual deviance: 2916.7  on 9998  degrees of freedom
## AIC: 2921
## 
## Number of Fisher Scoring iterations: 6

Maximum likelihood estimation with continuous variables

The probability density function

\[ Y_i \sim N(\mu, \sigma) \]

\[ f(y ) = {\frac {1}{\sigma {\sqrt {2\pi }}}} \exp \left [ {-{\frac {1}{2}}\left({\frac {y-\mu }{\sigma }}\right)^{2}} \right ] \]

The likelihood function for \(n\) observations

\[\begin{align} L(\mu, \sigma) =f_{(Y_1,\ldots,Y_n)}(y_1,\ldots, y_n) = \prod_i f_{Y_i}(y_i) = \prod_i \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(y_i-\mu)^2}{2\sigma^2}}. \end{align}\]

Log-likelihood function in the normal case

\[\begin{align} L(\mu, \sigma) = \prod_i \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(y_i-\mu)^2}{2\sigma^2}}. \end{align}\]

\[ l(\mu,\sigma) = \ln( L(\mu, \sigma) ) = \sum_i \ln\Big(\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(y_i-\mu)^2}{2\sigma^2}} \Big) \] \[= \sum_i\Big( -\ln(\sigma \sqrt{2\pi}) -\frac{(y_i-\mu)^2}{2\sigma^2}\Big) = -n\ln(\sigma \sqrt{2\pi}) -\sum_i\frac{(y_i-\mu)^2}{2\sigma^2} \]

\[\frac{\partial}{\partial \mu } l(\mu,\sigma) = \sum_i\frac{2(y_i-\mu)}{2\sigma^2}= \frac{1}{\sigma^2}\sum_i (y_i-\mu) = \frac{1}{\sigma^2}\Big(\sum_i y_i-n\mu \Big)= 0 \]

\[\frac{\partial}{\partial \sigma } l(\mu,\sigma) = -\frac{n}{\sigma} + \sum_i\frac{(y_i-\mu)^2}{\sigma^3} = 0 \]

\[ -{n} + \sum_i\frac{(y_i-\bar{y})^2}{\sigma^2} = 0 \] \[n=\frac{1}{\sigma^2} \sum_i {(y_i-\bar{y})^2} \] \[ \sigma^2 = \frac{1}{n}\sum_i(y_i-\bar{y})^2\]

\[\hat{\mu} = \frac{1}{n} \sum_i y_i = \bar{y}\] \[ \hat{\sigma}^2 = \frac{1}{n}\sum_i(y_i-\bar{y})^2\]

Numerical solution - normal distribution

trees <- read.delim("https://asta.math.aau.dk/datasets?file=trees.txt")
head(trees)
##   Girth Height Volume
## 1   8.3     70     10
## 2   8.6     65     10
## 3   8.8     63     10
## 4  10.5     72     16
## 5  10.7     81     19
## 6  10.8     83     20
qqnorm(trees$Height)
qqline(trees$Height)

Numerical solution - normal distribution

loglik_normal <- function(theta) {
  mu <- theta[1]
  sigma <- theta[2]
  y<-trees$Height
  fy<-dnorm(y , mean = mu, sd = sigma)
  sum(log(fy))
}
loglik_normal(c(1,5))
## [1] -3590
optim(c(1, 5), loglik_normal,control=list(fnscale=-1))
## $par
## [1] 76.0  6.3
## 
## $value
## [1] -101
## 
## $counts
## function gradient 
##      103       NA 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
mean(trees$Height)
## [1] 76
sd(trees$Height)
## [1] 6.4
n <- length(trees$Height)
sd(trees$Height)*sqrt((n-1)/n)
## [1] 6.3

Properties of maximum likelihood estimators