---
title: "Stochastic processes II"
author: "The ASTA team"
output:
  slidy_presentation:
    highlight: tango
    fig_caption: false
  pdf_document:
    keep_tex: yes
    toc: yes
    highlight: tango
    number_section: true
    fig_caption: false
---

# Basic stochastic models

Today we will consider various stochastic process models for modeling data evolving over time. 

Last time we saw three basic models:

- White noise: independent random variables - not very interesting by itself, but an essential building block in more complicated/realistic models.
- Random walk: cumulatively adding random variables - moves around randomly, and resulting in non-stationary behaviour.
- Autoregressive process: AR(1) is a simple model for something evolving over time, and is stationary or not depending on choice of parameters. The main model class considered today is ARMA (autogressive moving average), which is a generalisation of AR(1).

First we take a recap of the three models above, and add some details.

----

# White noise

A time series $w_t$, $t=1,\dots,n$ is *white noise* if the variables $w_1, w_2, \dots, w_n$
are *independent* and *identically* distributed with a mean of zero.

From the definition it follows that white noise is a second order stationary
process since the variance function $\sigma^2(t) = \sigma^2$ is the same constant
for all $t$ and the autocovariance is $Cov(w_t,w_{t+k})=0$ for all $k\neq0$ which
does not depend on $t$. We summarize this as:
$$\mu = 0$$
$$\gamma(k) = Cov(w_t, w_{t+k}) = \begin{cases} \sigma^2 \text{ for } k=0\\ 0 \text{ for } k\neq0\end{cases}$$
$$\rho(k) = \begin{cases} 1 \text{ for } k=0\\ 0 \text{ for } k\neq0\end{cases}$$


Often we will also assume the distribution of each $w_t$ is Gaussian (i.e. 
$w_t \sim \texttt{norm}(0,\sigma)$) and then we call it *Gaussian white noise*.

----

# Simulation of white noise

To understand how white noise behaves we can simulate it with R and plot both
the series and the autocorrelation:
```{r}
w <- rnorm(100, mean = 0, sd = 1)
par(mfrow = c(2,1), mar = c(4,4,0,0))
ts.plot(w)
acf(w)
```

It is a good idea to repeat this simulation and plot a few times to appreciate the
variability of the results.

----

# Random walk

A time series $x_t$ is called a random walk if
$$x_t = x_{t-1} + w_t$$
where $w_t$ is a white noise series. Using $x_{t-1} = x_{t-2} + w_{t-1}$ we get
$$x_t = x_{t-2} + w_{t-1} + w_{t}$$
Substituting for $x_{t-2}$ we get
$$x_t = x_{t-3} + w_{t-2} + w_{t-1} + w_{t}$$
Continuing this way we would get an infinite sum of white noise
$$x_t = w_{t} + w_{t-1} + w_{t-2} + w_{t-3} + \dots$$
However, we will assume we have a fixed starting point $x_0=0$ such that
$$x_t = w_{1} + w_{2} + \dots + w_{t}$$

----

# Properties of random walk

A random walk $x_t$ has a constant mean function
$$\mu(t) = 0$$
since the random walk at time $t$ is a sum of $t$ white noise terms that all have
mean zero.

However, the variance function
$$\sigma^2(t) = t\cdot\sigma^2$$
clearly depends on the time $t$, so the process is not stationary. The variance
function is derived from the general fact that for **independent random variables**,
$y_1$ and $y_2$, the variance of the sum is
$$Var(y_1+y_2) = Var(y_1) + Var(y_2).$$
Thus,
$$Var(x_t) = Var(w_1 + w_2 + \dots + w_t) = \sigma^2 + \sigma^2 + \dots + \sigma^2 = t\sigma^2$$

The non-stationary autocovariance function is
$$Cov(x_t, x_{t+k}) = t\sigma^2, \quad k = 0, 1, \dots$$
which only depends on how many white noise terms $x_t$ and $x_{t+k}$ have in common ($t$)
and not how far they are separated ($k$).

By combining the two results we obtain the non-stationary autocorrelation function
$$Cor(x_t, x_{t+k}) = \frac{Cov(x_t, x_{t+k})}{\sqrt{Var(x_t)Var(x_{t+k})}} = \frac{t\sigma^2}{\sqrt{t\sigma^2(t+k)\sigma^2}} = \frac{1}{\sqrt{1+k/t}}$$

When $t$ is large compared to $k$ we have very high correlation (close to one)
and even though the process is not stationary we expect the correlogram of a
reasonably long random walk to show very slow decay.

----

# Simulation of random walk

We already know how to simulate Gaussian white noise (with `rnorm`) and the
random walk is just a cumulative sum of white noise:
```{r}
w <- rnorm(1000)
rw <- cumsum(w)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(rw)
acf(rw, lag.max = 100)
```

----

# Differencing

The slowly decaying acf for random walk is a classical sign of non-stationarity,
indicating there may be some kind of trend. In this case there is no real trend,
since the theoretical mean is constant zero, but we refer to the apparent trend
which seems to change directions unpredictiably as a stochastic trend.

If a time series shows these signs of non-stationarity we can try to study the
time series of differences and see if that is stationary and easier to understand:
$$\nabla x_t = x_{t} - x_{t-1}.$$
Since we assume/define $x_0=0$ we get
$$\nabla x_1 = x_1$$
$$\nabla x_2 = x_2 - x_1$$
$$\nabla x_3 = x_3 - x_2$$
etc.  

Specifically when we difference a random walk $x_t = x_{t-1} + w_t$ we recover the white noise series
$\nabla x_t = w_t$:

```{r}
diffrw <- diff(rw)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(diffrw)
acf(diffrw, lag.max = 30)
```

In general if a time series needs to be differenced to become stationary we say
that the series is integrated (of order 1).

----

# Example: Exchange rate

If we look at the exchange rate from GBP to NZD, we observe what
looks like an unpredictable stochastic trend and we would like to see if it
could reasonably be described as a random walk.
```{r}
www <- "https://asta.math.aau.dk/eng/static/datasets?file=pounds_nz.dat"
exchange_data <- read.table(www, header = TRUE)
exchange <- ts(exchange_data, start = 1991, freq = 4)
plot(exchange)
```

To this end we difference the series and see if the difference looks like white
noise:
```{r}
diffexchange <- diff(exchange)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
plot(diffexchange)
acf(diffexchange)
```

The first order difference looks reasonably stationary, so the original exchange
rate series could be considered integrated of order 1. However, there is an
indication of significant autocorrelation at lag 1, so a random
walk might not be a completely satisfactory model for this dataset.

# Auto-regressive (AR) models

## Auto-regressive model of order 1: AR(1)

A significant auto-correlation at lag 1 means that $x_t$ and $x_{t-1}$ are correlated
so the previous value $x_{t-1}$ can be used to predict the current value $x_t$.
This is the idea behind an auto-regressive model of order one AR(1):
$$x_t = \alpha_1 x_{t-1} + w_t$$
where $w_t$ is white noise and the auto-regressive coefficient $\alpha_1$ is a
parameter to be estimated from data.

The model is only stationary if $-1<\alpha_1<1$ such that the dependence of the
past decreases with time.

----

### Properties of AR(1) models

For a stationary AR(1) model with $-1<\alpha_1<1$ we have previously shown that

- $\mu(t) = 0$
- $Var(x_t) = \sigma^2(t) = \sigma^2/(1-\alpha_1^2)$
- $\gamma(k) = \alpha_1^k\sigma^2/(1-\alpha_1^2)$
- $\rho(k) = \alpha_1^k$

Below are the theoretical autocorrelation functions for the following AR(1) models:

- Model 1: $x_t = 0.7 x_{t-1} + w_t$
- Model 2: $x_t = -0.7 x_{t-1} + w_t$

```{r}
k <- 0:10
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
plot(k, 0.7^k, type = "h", xlab = "")
plot(k, (-0.7)^k, type = "h", ylim = c(-1,1))
abline(h=0)
```

----

### Simulation of AR(1) models

R has a built-in function `arima.sim` to simulate AR(1) and other more complicated
models called ARMA and ARIMA. It needs the model (i.e. the autoregressive coefficient $\alpha_1$) and the
desired number of time steps `n`. To simulate 200 time steps of AR(1) with $\alpha_1=0.7$
we do:
```{r}
x <- arima.sim(model = list(ar = 0.7), n = 200)
plot(x)
acf(x)
k = 0:30
points(k, 0.7^k, col = "red")
```

Here we have compared the empirical correlogram with the theoretical values of
the model.

----

### Fitted AR(1) models

To estimate the parameters in an AR(1) process, we use the so-called Yule-Walker equations. This essentially means that we pick the parameter $\alpha_1$ such that the theoretical autocorrelation function fits the correlogram of the data as close as possible. We skip the details of this, and simple use the function `ar`:
```{r}
fit <- ar(x, order.max = 1)
```
The resulting object contains the value of the estimated parameter and a bunch of
other information. In this case the input data are artificial so we know we
should ideally get a value close to 0.7:
```{r}
fit$ar
```
An estimate of the variance of the estimate $\hat\alpha_1$ is given in `fit$asy.var.coef`
(the estimated std. error is the square root of this):
```{r}
fit$asy.var.coef
se <- sqrt(fit$asy.var.coef)
se
ci <- c(fit$ar - 2*se, fit$ar + 2*se)
ci
```

The AR(1) model defined earlier has mean 0. However, we cannot expect data to fulfill this. This is fixed by subtracting the average $\bar{x}$ of the
data before doing anything else, so the model that is fitted is actually:
$$x_t-\bar{x} = \alpha_1 \cdot (x_{t-1} - \bar{x}) + w_t$$
To predict the value of $x_t$ given $x_{t-1}$ we use that $w_t$ is white noise
so we expect it to be zero on average:
$$\hat{x}_t-\bar{x} = \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x})$$
So the predictions are given by
$$\hat{x}_t = \bar{x} + \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x}), \quad t\ge2.$$
Given the predictions we can estimate the model errors as usual by the model residuals:
$$\hat{w}_t = x_t - \hat{x}_t, \quad t\ge2.$$

If we believe the model describes the dataset well the residuals should look like
a sample of white noise:
```{r}
res <- na.omit(fit$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(res)
acf(res)
```

This naturally looks good for this artificial dataset.

----

### AR(1) model fitted to exchange rate

A random walk is an example of a AR(1) model with $\alpha_1=1$, and it is non-stationary.
This didn't provide an ideal fit for the exchange rate dataset, so we might suggest
a stationary AR(1) model with $\alpha_1$ as a parameter to be estimated from data:
```{r}
fitexchange <- ar(exchange, order.max = 1)
fitexchange$ar
resexchange <- na.omit(fitexchange$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resexchange)
acf(resexchange)
```

This does not appear to really provide a better fit than the random walk model proposed earlier.

An alternative would be to propose a AR(1) model for the differenced time series
$\nabla x_t = x_t - x_{t-1}$:
```{r}
dexchange <- diff(exchange)
fitdiff <- ar(dexchange, order.max = 1)
fitdiff$ar
resdiff <- na.omit(fitdiff$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resdiff)
acf(resdiff)
```

----

### Prediction from AR(1) model

We can use a fitted AR(1) model to predict future values of a time series. If
the last observed time point is $t$ then we predict $x_{t+1}$ using the equation
given previously:
$$\hat{x}_{t+1} = \bar{x} + \hat{\alpha}_1 \cdot (x_{t} - \bar{x}).$$
If we want to predict $x_{t+2}$ we use
$$\hat{x}_{t+2} = \bar{x} + \hat{\alpha}_1 \cdot (\hat{x}_{t+1} - \bar{x}).$$
And we can continue this way. Prediction is performed by `predict` in R. E.g.
for the AR(1) model fitted to the exchange rate data the last observation is in
third quarter of 2000. If we want to predict 1 year ahead to third quarter of 2001
(probably a bad idea due to the stochastic trend):
```{r}
pred1 <- predict(fitexchange, n.ahead = 4)
pred1
```
Note how the prediction returns both the predicted value and a standard error
for this value. So we predict that the exchange rate in third quarter of 2001 would be
within $3.27 \pm 0.6$ with approximately 95% probability.

We can plot a prediction and approximate 95% pointwise prediction intervals with
`ts.plot` (where we use a 10 year prediction -- which is a very bad idea -- to see how it behaves in the long run):
```{r}
pred10 <- predict(fitexchange, n.ahead = 40)
lower10 <- pred10$pred-2*pred10$se
upper10 <- pred10$pred+2*pred10$se
ts.plot(exchange, pred10$pred, lower10, upper10, lty = c(1,2,3,3))
```

----

## Auto-regressive models of higher order

The first order auto-regressive model can be generalised to higher order by adding
more lagged terms to explain the current value $x_t$. An AR(p) process is
\[
x_t = \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \dots + \alpha_p x_{t-p} + w_t
\]
where $w_t$ is white noise and $\alpha_1, \alpha_2, \dots, \alpha_p$ are parameters
to be estimated from data.

The notation can be a little cumbersome when we have many terms, so we introduce the backshift operator $B$, that takes the time series one step back, i.e.
\[
B x_t = x_{t-1}
\]
This can be used repeatedly, for example $B^2 x_t = B B x_t = B x_{t-1} = x_{t-2}$. We can then make a polynomial of backshift operators
\[
\alpha(B) = 1-\alpha_1 B - \alpha_2 B^2 - \dots - \alpha_p B^p
\]
and write the AR(p) process as 
\[
\alpha(B) x_t = w_t.
\]

The parameters cannot be chosen arbitrarily if we want the model to be stationary.
To check that a given AR(p) model is stationary we must find all the roots of the
characteristic equation
\[
1 - \alpha_1 z - \alpha_2 z^2 - \dots - \alpha_p z^p = 0
\]
and check that the absolute value of each root is greater than 1. This can be written with the polynomial from above, but with $z$ inserted instead of $B$, i.e.
\[
\alpha(z) = 0.
\]
Solving a $p$-order polynomial is hard for high values of $p$, so we will let R do this for us.

----

### Estimation of AR(p) models

For an AR(p) model there are typically two things we need to estimate:

1. The maximal non-zero lag $p$ in the model.
2. The autoregressive coefficients/parameters $\alpha_1,\dots,\alpha_p$.

For the first point we can use AIC (Akaike's Information Criterion). This is essentially a balance between simplicity and good fit of a model. The AIC results in a single real number, where smaller is better. The ar function in R uses AIC to automatically select the value for $p$ by calculating the AIC for all models with $p$ between 1 and some chosen maximal value, and picking the one with the smallest AIC.

Once the order is chosen and the estimates $\hat\alpha_1, \dots, \hat\alpha_p$ are found
the corresponding standard errors can be found as the square root of the diagonal of the matrix stored as `asy.var.coef` in the fitted model object.

----

### Example of AR(p) model

We use an example of monthly global temperatures expressed as anomalies from the
monthly average in 1961-1990. We reduce the dataset to the yearly mean temperature,
and fit an AR(p) model to this. A good fit would indicate that the higher temperatures
over the last decade could be explained by a purely stochastic process which
just has dependence on the temperature anomalies from previous year and eventually might
as well start decreasing again. (However, this does not mean that there is no climate crisis!
There is lots of scientific evidence of this based on much more complicated models
and more detailed data.)

```{r}
global_data <- scan("https://asta.math.aau.dk/eng/static/datasets?file=global.dat")
global_monthly <- ts(global_data, st = c(1856,1), end = c(2005,12), freq = 12)
global <- aggregate(global_monthly, FUN = mean)
plot(global)
```

```{r}
globalfit <- ar(global, order.max = 10)
globalfit
globalresid <- na.omit(globalfit$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(globalresid)
acf(globalresid, lag.max = 30)
```

We are not assured that the estimated model will be stationary, but we can solve $\alpha(z)=0$ and check if the solutions all have absolute values larger than 1, in which case the estimated model is stationary:

```{r}
abs(polyroot(c(1,-globalfit$ar)))
```

# Moving average models

Another class of models are moving average (MA) models. An moving average process
of order q, MA(q), is defined by
$$x_t = w_t + \beta_1w_{t-1} + \beta_2w_{t-2} + \dots + \beta_qw_{t-q}$$
where $w_t$ is a white noise process with mean zero and variance $\sigma_w^2$
and $\beta_1, \beta_2, \dots, \beta_q$ are parameters to be estimated.

The moving average process also has a short notation using the backshift operator, given by
\[
x_t = \beta(B) w_t
\]
where the polynomial $\beta(B)$ is given by
\[
\beta(B) = 1 + \beta_1 B + \beta_2 B^2 + \dots + \beta_q B^q.
\]

Since a moving average process is a finite sum of stationary white noise terms it
is itself stationary and therefore the mean and variance is time-invariant
(same constant mean and variance for all $t$):

- Mean $\mu(t) = 0$
- Variance $\sigma^2(t) = \sigma_w^2(1 + \beta_1^2 + \beta_2^2 + \dots + \beta_q^2)$

The autocorrelation function, for $k\ge0$, is
$$\rho(k) = \begin{cases} 1 & k=0\\ \sum_{i=0}^{q-k} \beta_i\beta_{i+k}/\sum_{i=0}^q \beta_i^2 & k=1,2,\dots,q\\ 0 & k>q\end{cases}$$
where $\beta_0=1$.

----

## Simulation of MA(q) processes

To simulate a MA(q) process we just need the white noise process $w_t$ and then
transform it using the MA coefficients. If we e.g. want to simulate a model with
$\beta_1 = -0.7$, $\beta_2 = 0.5$, and $\beta_3 = -0.2$ we can use `arima.sim`:
```{r}
xsim <- arima.sim(list(ma = c(-.7, .5, -.2)), n = 200)
plot(xsim)
```

The theoretical autocorrelations are in this case:
$$\rho(1) = \frac{1 \cdot (-0.7) + (-0.7) \cdot 0.5 + 0.5 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.65$$
$$\rho(2) = \frac{1 \cdot 0.5 + (-0.7) \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = 0.36$$
$$\rho(3) = \frac{1 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.11$$

```{r}
acf(xsim)
points(0:25, c(1,-.65, .36, -.11, rep(0,22)), col = "red")
```

----

### Estimation of MA(q) models

To estimate the parameters of a MA(q) model we use `arima`:
```{r}
xfit <- arima(xsim, order = c(0,0,3))
xfit
```

The function `arima` does not include automatic selection of the order of the
model so this has to be chosen beforehand or selected by comparing several
proposed models and choosen the model with the minimal AIC.

# Mixed models: Auto-regressive moving average models

A time series $x_t$ follows an auto-regressive moving average (ARMA) process of
order $(p,q)$, denoted $ARMA(p,q)$, if
$$x_t = \alpha_1 x_{t-1} + \alpha_2 x_{t-2} + \dots + \alpha_p x_{t-p} + w_t + \beta_1w_{t-1} + \beta_2w_{t-2} + \dots + \beta_qw_{t-q}$$
where $w_t$ is a white noise process and
$\alpha_1, \alpha_2, \dots, \alpha_p, \beta_1, \beta_2, \dots, \beta_q$
are parameters to be estimated.

We can simulate an ARMA model with `arima.sim`. E.g. an ARMA(1,1) model:
```{r}
xarma <- arima.sim(model = list(ar = -0.6, ma = 0.5), n = 200)
plot(xarma)
```

Estimation is done with `arima` as before.

----

### Example with exchange rate data

For the exchange rate data we may e.g. suggest either a AR(1), MA(1) or ARMA(1,1) model.
We can compare fitted model using `AIC` (smaller is better):
```{r}
exchange_ar <- arima(exchange, order = c(1,0,0))
AIC(exchange_ar)
exchange_ma <- arima(exchange, order = c(0,0,1))
AIC(exchange_ma)
exchange_arma <- arima(exchange, order = c(1,0,1))
AIC(exchange_arma)
exchange_arma
```

```{r}
par(mfrow = c(2,1), mar = c(4,4,1,1))
resid_arma <- na.omit(exchange_arma$residuals)
plot(resid_arma)
acf(resid_arma)
```