---
title: "Stochastic processes II"
author: "The ASTA team"
output:
  slidy_presentation:
    highlight: tango
    fig_caption: false
  pdf_document:
    keep_tex: yes
    toc: yes
    highlight: tango
    number_section: true
    fig_caption: false
---

# Basic stochastic process models

----

## Stochastic processes

* Last time we introduced stochastic processes

```{r, echo=FALSE,fig.height=5, fig.width=7}
w = ts(rnorm(500))
x1 = filter(w,0,method="recursive")
x2 = filter(w,0.9,method="recursive")
x3 = filter(w,1,method="recursive")
ts.plot(x1,x2,x3,col=1:3)
```


* We saw three basic models:
  - **White noise:** independent random variables - not very interesting by itself, but an essential building block in more complicated/realistic models.
  - **Random walk:** cumulatively adding white noise - non-stationary, tendency to wander off.
  - **Autoregressive process AR($1$):** weaker dependence on previous value than random walk, is stationary or not depending on choice of parameters. 

* Today we will consider more stochastic process models. 

* First we take a recap of the three models above, and add some details.

# White noise

----

## White noise

* A time series $W_t$, $t=1,\dots,n$ is **white noise** if the variables $W_1, W_2, \dots, W_n$ are
  - independent
  - identically distributed 
  - have mean zero and variance $\sigma^2$

* From the definition it follows that white noise is a second order stationary
process since 
  - The mean is constant ($=0$)
  - The variance function $\sigma^2(t) = \sigma^2$ is constant
  - The autocorrelation $Cor(W_t,W_{t+k})=0$ for all $k\neq0$, which
does not depend on $t$. 

* Hence, we have a well-defined mean,
$$\mu = 0,$$
autocovariance function
$$\gamma(k) = Cov(W_t, W_{t+k}) = \begin{cases} \sigma^2 \text{ for } k=0,\\ 0 \text{ for } k\neq0,\end{cases}$$
and autocorrelation function
$$\rho(k) = \begin{cases} 1 \text{ for } k=0,\\ 0 \text{ for } k\neq0.\end{cases}$$

----

## Correlogram for white noise

* To understand how white noise behaves we can simulate it with R and plot both
the series and the autocorrelation:
```{r}
w <- rnorm(100, mean = 0, sd = 1)
par(mfrow = c(2,1), mar = c(4,4,0,0))
ts.plot(w)
acf(w)
```

* 95\% of the estimated autocorrelations at lag $k>0$ lie between the blue lines.

* It is a good idea to repeat this simulation and plot a few times to appreciate the
variability of the results.

# Random walk

----

## Random walk

* A time series $X_t$ is called a **random walk** if
$$X_t = X_{t-1} + W_t$$
where $W_t$ is a white noise series. 

* Using $X_{t-1} = X_{t-2} + W_{t-1}$ we get
$$X_t = X_{t-2} + W_{t-1} + W_{t}$$
* Substituting for $X_{t-2}$ we get
$$X_t = X_{t-3} + W_{t-2} + W_{t-1} + W_{t}$$
* Continuing this way, assuming we start at $X_0=0$, 
$$X_t = W_{1} + W_{2} + \dots + W_{t}$$


----

## Properties of random walk

* A random walk $X_t$ has a constant mean function
$$\mu(t) =  0.$$


* The variance function is given by
$$\sigma^2(t) = Var(X_t) = Var(W_1 + \dotsm + W_t) = Var(W_1) + \dotsm + Var(W_t) = t\cdot\sigma^2,$$
which clearly depends on the time $t$, so the process is not stationary. 

* Similarly, one can compute the  autocovariance and autocorrelation function (see Cowpertwait and Metcalfe for details)
$$Cov(X_t, X_{t+k}) = t\sigma^2.$$

$$Cor(X_t, X_{t+k}) = \frac{1}{\sqrt{1+k/t}}$$

* The autocorrelation is non-stationary.
* When $t$ is large compared to $k$ we have very high correlation (close to $1$)

* We expect the correlogram of a
reasonably long random walk to show very slow decay.

----

## Simulation and correlogram of random walk

* We already know how to simulate Gaussian white noise $W_t$(with `rnorm`) and the
random walk is just generated by repeatedly adding noise terms:
```{r}
w <- rnorm(1000)
rw<-w
for(t in 2:1000) rw[t]<- rw[t-1] + w[t]
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(rw)
acf(rw, lag.max = 100)
```


* The slowly decaying acf for random walk is a classical sign of non-stationarity,
indicating there may be some kind of trend. In this case there is no real trend,
since the theoretical mean is constant zero, but we refer to the apparent trend
which seems to change directions unpredictiably as a **stochastic trend**.

----

## Differencing

* If a non-stationary time series shows a stochastic trend we can try to study the
**time series of differences** and see if that is stationary and easier to understand:
$$\nabla X_t = X_{t} - X_{t-1}.$$
* Since we assume $X_0=0$ we get
$$\nabla X_1 = X_1.$$
 
* Specifically when we difference a random walk $X_t = X_{t-1} + W_t$ we recover the white noise series
$$\nabla X_t = W_t$$

```{r}
diffrw <- diff(rw)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(diffrw)
acf(diffrw, lag.max = 30)
```

* If a time series needs to be differenced to become stationary, we say
that the series is **integrated** (of order $1$).

----

## Example: Exchange rate

* We consider a data set for the exchange rate from GBP to NZD. We observe what
looks like an unpredictable stochastic trend and we would like to see if it
could reasonably be described as a random walk.
```{r}
www <- "https://asta.math.aau.dk/eng/static/datasets?file=pounds_nz.dat"
exchange_data <- read.table(www, header = TRUE)[[1]]
exchange <- ts(exchange_data, start = 1991, freq = 4)
plot(exchange)
```

* We difference the series and see if the differences looks like white
noise:
```{r}
diffexchange <- diff(exchange)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
plot(diffexchange)
acf(diffexchange)
```

* The first order difference looks reasonably stationary, so the original exchange
rate series could be considered integrated of order $1$. However, there is an
indication of significant autocorrelation at lag $1$, so a random
walk might not be a completely satisfactory model for this dataset.

# First order auto-regressive models AR($1$)

----

## Auto-regressive model of order $1$: AR($1$)

* If the correlogram shows a significant auto-correlation at lag $1$, it means that $X_t$ and $X_{t-1}$ are correlated.

* The simplest way to model this, is the auto-regressive model of order one, AR($1$):
$$X_t = \alpha_1 X_{t-1} + W_t$$
where $W_t$ is white noise and the auto-regressive coefficient $\alpha_1$ is a
parameter to be estimated from data.


----

## Properties of AR($1$) models

* From last time: The model can only be stationary if $-1<\alpha_1<1$ such that the dependence on the
past decreases with time.

* For a stationary AR($1$) model with $-1<\alpha_1<1$ we found
  - $\mu(t) = 0$
  - $Var(X_t) = \sigma^2_t = \sigma^2/(1-\alpha_1^2)$
  - $\gamma(k) = \alpha_1^k\sigma^2/(1-\alpha_1^2)$
  - $\rho(k) = \alpha_1^k$

----

## Simulation of AR($1$) models

* R has a built-in function `arima.sim` to simulate AR($1$) models (and other more complicated
models called ARMA and ARIMA). 

* It needs the model (i.e. the autoregressive coefficient $\alpha_1$) and the
desired number of time steps `n`. 

* To simulate 200 time steps of AR($1$) with $\alpha_1=0.7$
we do:
```{r}
x <- arima.sim(model = list(ar = 0.7), n = 200)
plot(x)
acf(x)
k = 0:30
points(k, 0.7^k, col = "red")
```

* Here we have compared the empirical correlogram with the theoretical values of
the model.

----

## Fitted AR($1$) models

* There are several ways to estimate the parameters in an AR($1$) process. We use the so-called maximum likelihood estimation method (MLE). 

* We skip the details of this, and simply use the function `ar`:
```{r}
fit <- ar(x, order.max = 1, method="mle")
```
* The resulting object contains the value of the estimated parameter $\hat\alpha_1$ and a bunch of
other information. 

* In this example, the input data are artificial so we know that  $\hat\alpha_1$
should ideally be close to 0.7:
```{r}
fit$ar
```
* An estimate of the variance (squared standard error) of $\hat\alpha_1$ is given in `fit$asy.var.coef`

```{r}
fit$asy.var.coef
```
* The estimated std. error is the square root of this:
```{r}
se <- sqrt(fit$asy.var.coef)
se
ci <- c(fit$ar - 2*se, fit$ar + 2*se)
ci
```

----

## Residuals for AR($1$) models

* The AR($1$) model defined earlier has mean 0. However, we cannot expect data to fulfill this.

* This is fixed by subtracting the average $\bar{X}$ of the
data before doing anything else, so the model that is fitted is actually:
$$X_t-\bar{X} = \alpha_1 \cdot (X_{t-1} - \bar{X}) + W_t$$
* The fitted model can be used to make predictions. To predict $X_t$ from $X_{t-1}$, we use that $W_t$ is white noise,
so we expect it to be zero on average:
$$\hat{x}_t-\bar{x} = \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x}),$$
so
$$\hat{x}_t = \bar{x} + \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x}), \quad t\ge2.$$
* We can estimate the white noise terms as usual by the model residuals:
$$\hat{w}_t = x_t - \hat{x}_t, \quad t\ge2.$$

* If the model is correct, the residuals should look like a sample of white noise. We check this by plotting the acf:
```{r}
res <- na.omit(fit$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(res)
acf(res)
```

* It naturally looks good for this artificial dataset.

----

## AR($1$) model fitted to exchange rate

* A random walk is an example of an AR($1$) model with $\alpha_1=1$.
It didn't provide an ideal fit for the exchange rate dataset, so we might suggest
a stationary AR($1$) model with $\alpha_1$ as a parameter to be estimated from data:
```{r}
fitexchange <- ar(exchange, order.max = 1, method="mle")
fitexchange$ar
resexchange <- na.omit(fitexchange$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resexchange)
acf(resexchange)
```

* This does not appear to really provide a better fit than the random walk model proposed earlier.

* An alternative would be to propose an AR($1$) model for the differenced time series
$\nabla X_t = X_t - X_{t-1}$:
```{r}
dexchange <- diff(exchange)
fitdiff <- ar(dexchange, order.max = 1,method="mle")
fitdiff$ar
resdiff <- na.omit(fitdiff$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resdiff)
acf(resdiff)
```

----

## Prediction/forecasting from AR($1$) model

* We can use a fitted AR($1$) model to predict the next value of an observed time series $x_1,\ldots,x_n$. 

$$\hat{x}_{n+1} = \bar{x} + \hat{\alpha}_1 \cdot (x_{n} - \bar{x}).$$

* This can be iterated  to predict $x_{n+2}$
$$\hat{x}_{n+2} = \bar{x} + \hat{\alpha}_1 \cdot (\hat{x}_{n+1} - \bar{x}),$$
and we can continue this way. 

* Prediction is performed by `predict` in R. 

* E.g. for the AR($1$) model fitted to the exchange rate data, the last observation is in
third quarter of 2000. If we want to predict $1$ year ahead to third quarter of 2001:
```{r}
pred1 <- predict(fitexchange, n.ahead = 4)
pred1
```

* Note how the prediction returns both the predicted value and a standard error
for this value. 

* We can use this to say that we are 95\% confident that the exchange rate in third quarter of 2001 would be
within $3.42 \pm 2\cdot 0.25$.

* We can plot a prediction and approximate 95% pointwise prediction intervals with
`ts.plot` (where we use a 10 year prediction -- which is a very bad idea -- to see how it behaves in the long run):
```{r}
pred10 <- predict(fitexchange, n.ahead = 40)
lower10 <- pred10$pred-2*pred10$se
upper10 <- pred10$pred+2*pred10$se
ts.plot(exchange, pred10$pred, lower10, upper10, lty = c(1,2,3,3))
```


# Auto-regressive models of higher order (AR(p)-models)

----

## Auto-regressive models of higher order

* The first order auto-regressive model can be generalised to higher order by adding
more lagged terms to explain the current value $X_t$. 

* An **AR(p) process** is
\[
X_t = \alpha_1 X_{t-1} + \alpha_2 X_{t-2} + \dots + \alpha_p X_{t-p} + W_t
\]
where $W_t$ is white noise and $\alpha_1, \alpha_2, \dots, \alpha_p$ are parameters
to be estimated from data.

* To simplify the notation, we introduce the **backshift operator** $B$, that takes the time series one step back, i.e.
\[
B X_t = X_{t-1}
\]
* This can be used repeatedly, for example $B^2 X_t = B B X_t = B X_{t-1} = X_{t-2}$. We can then make a "polynomial" of backshift operators
\[
\alpha(B) = 1-\alpha_1 B - \alpha_2 B^2 - \dots - \alpha_p B^p
\]
and write the AR(p) process as 
\[
W_t = \alpha(B) X_t.
\]

----

## Stationarity for AR(p) models   

* Not all AR(p) models are stationary. To check that a given AR(p) model is stationary we consider the **characteristic equation**
\[
1 - \alpha_1 z - \alpha_2 z^2 - \dots - \alpha_p z^p = 0.
\]
* The characteristic equation can be written with the polynomial $\alpha$ from previous slide, but with $z$ inserted instead of $B$, i.e.
\[
\alpha(z) = 0.
\]
* The process is stationary if and only if all roots of the characteristic equation have an absolute value that is greater than $1$. 

* Solving a $p$-order polynomial is hard for high values of $p$, so we will let R do this for us.

----

## Estimation of AR(p) models

* For an AR(p) model there are typically two things we need to estimate:

1. The maximal non-zero lag $p$ in the model.
2. The autoregressive coefficients/parameters $\alpha_1,\dots,\alpha_p$.

* To select $p$ we can use **AIC (Akaike's Information Criterion)**. 
  * More complex models can always fit a dataset better
  * AIC is essentially a balance between model simplicity and good fit. 
  * The AIC results in a single real number, where smaller is better. 
  * The ar function in R uses AIC to automatically select the value for $p$ by calculating the AIC for all models with $p$ between $1$ and some chosen maximal value, and picking the one with the smallest AIC.

* Once the order $p$ is chosen, the estimates $\hat\alpha_1, \dots, \hat\alpha_p$ can be computed by the ar function. 
  * The corresponding standard errors can be found as the square root of the diagonal of the matrix stored as `asy.var.coef` in the fitted model object.

----

## Example of AR(p) model for electricity data

* We try to fit an AR(p) model to the detrended data for the log of the electricity production in Australia:

```{r, echo = FALSE}
CBEdata <- read.table("https://asta.math.aau.dk/eng/static/datasets?file=cbe.dat", header = TRUE)
CBE <- ts(log(CBEdata[,3]),frequency=12)
random<-na.omit(decompose(CBE)$random)
plot(random, ylab="Random term",main="Electricity production")
```

```{r}
fit <- ar(random, order.max = 5,method="mle")
fit
resid <- na.omit(fit$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resid)
acf(resid, lag.max = 30)
```

* There are too many significant autocorrelations. Suggests the model is not a good fit. 

* We are not assured that the estimated model will be stationary. To check stationarity we solve $\alpha(z)=0$:

```{r}
abs(polyroot(c(1,-fit$ar)))
```

* All absolute values of the roots are greater than $1$, indicating stationarity.

# Moving average models (MA(q)-models)

----

## The moving average model

 * A **moving average process** of order q, MA(q), is defined by
$$X_t = W_t + \beta_1W_{t-1} + \beta_2W_{t-2} + \dots + \beta_qW_{t-q}$$
where $W_t$ is a white noise process with mean 0 and variance $\sigma_w^2$
and $\beta_1, \beta_2, \dots, \beta_q$ are parameters to be estimated.

* The moving average process can also be written using the backshift operator: 
\[
X_t = \beta(B) W_t
\]
where the polynomial $\beta(B)$ is given by
\[
\beta(B) = 1 + \beta_1 B + \beta_2 B^2 + \dots + \beta_q B^q.
\]

* Since a moving average process is a finite sum of stationary white noise terms it
is itself stationary with mean and variance:
  - Mean $\mu(t) = 0$.
  - Variance $\sigma^2(t) = \sigma_w^2(1 + \beta_1^2 + \beta_2^2 + \dots + \beta_q^2)$.

* The autocorrelation function is
$$\rho(k) = \begin{cases} 1 & k=0\\ \sum_{i=0}^{q-k} \beta_i\beta_{i+k}/\sum_{i=0}^q \beta_i^2 & k=1,2,\dots,q\\ 0 & k>q\end{cases}$$
where $\beta_0=1$.

----

## Simulation of MA(q) processes

* An MA(q) process can be simulated by first generating a white noise process $W_t$ and then
transforming it using the MA coefficients. 

* To simulate a model with
$\beta_1 = -0.7$, $\beta_2 = 0.5$, and $\beta_3 = -0.2$ we can use `arima.sim`:
```{r, echo=FALSE}
xsim <- arima.sim(list(ma = c(-.7, .5, -.2)), n = 200)
plot(xsim)
```

* The theoretical autocorrelations are in this case:
$$\rho(1) = \frac{1 \cdot (-0.7) + (-0.7) \cdot 0.5 + 0.5 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.65$$
$$\rho(2) = \frac{1 \cdot 0.5 + (-0.7) \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = 0.36$$
$$\rho(3) = \frac{1 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.11$$
and $\rho(k)=0$ for $k>3$.

* We plot the acf of the simulated series together with the theoretical one.

```{r}
acf(xsim,ylim=c(-1,1))
points(0:25, c(1,-.65, .36, -.11, rep(0,22)), col = "red")
```

----

## Estimation of MA(q) models

* To estimate the parameters of an MA(3) model we use `arima`:
```{r}
xfit <- arima(xsim, order = c(0,0,3))
xfit
```

* The function `arima` does not include automatic selection of the order of the
model so this has to be chosen beforehand or selected by comparing several
proposed models and choosing the model with the minimal AIC.

```{r}
AIC(xfit)
```

# Auto-regressive moving average models (ARMA) 

----

## Mixed models: Auto-regressive moving average models

* A time series $X_t$ follows an auto-regressive moving average (ARMA) process of
order $(p,q)$, denoted $ARMA(p,q)$, if
$$X_t = \alpha_1 X_{t-1} + \alpha_2 X_{t-2} + \dots + \alpha_p X_{t-p} + W_t + \beta_1W_{t-1} + \beta_2W_{t-2} + \dots + \beta_qW_{t-q}$$
where $W_t$ is a white noise process and
$\alpha_1, \alpha_2, \dots, \alpha_p, \beta_1, \beta_2, \dots, \beta_q$
are parameters to be estimated.

* We can simulate an ARMA model with `arima.sim`. E.g. an ARMA($1,1$) model with $\alpha_1=-0.6$ and $\beta_1=0.5$:
```{r}
xarma <- arima.sim(model = list(ar = -0.6, ma = 0.5), n = 200)
plot(xarma)
```

* Estimation is done with `arima` as before.

----

## Example with exchange rate data

For the exchange rate data we may e.g. suggest either a AR($1$), MA($1$) or ARMA($1,1$) model.
We can compare fitted models using `AIC` (smaller is better):
```{r}
exchange_ar <- arima(exchange, order = c(1,0,0))
AIC(exchange_ar)
exchange_ma <- arima(exchange, order = c(0,0,1))
AIC(exchange_ma)
exchange_arma <- arima(exchange, order = c(1,0,1))
AIC(exchange_arma)
exchange_arma
```

```{r}
par(mfrow = c(2,1), mar = c(4,4,1,1))
resid_arma <- na.omit(exchange_arma$residuals)
plot(resid_arma)
acf(resid_arma)
```