The ASTA team
We saw three basic models:
Today we will consider more stochastic process models.
First we take a recap of the three models above, and add some details.
95% of the estimated autocorrelations at lag \(k>0\) lie between the blue lines.
It is a good idea to repeat this simulation and plot a few times to appreciate the variability of the results.
A time series \(X_t\) is called a random walk if \[X_t = X_{t-1} + W_t\] where \(W_t\) is a white noise series.
Using \(X_{t-1} = X_{t-2} + W_{t-1}\) we get \[X_t = X_{t-2} + W_{t-1} + W_{t}\]
Substituting for \(X_{t-2}\) we get \[X_t = X_{t-3} + W_{t-2} + W_{t-1} + W_{t}\]
Continuing this way, assuming we start at \(X_0=0\), \[X_t = W_{1} + W_{2} + \dots + W_{t}\]
A random walk \(X_t\) has a constant mean function \[\mu(t) = 0.\]
The variance function is given by \[\sigma^2(t) = Var(X_t) = Var(W_1 + \dotsm + W_t) = Var(W_1) + \dotsm + Var(W_t) = t\cdot\sigma^2,\] which clearly depends on the time \(t\), so the process is not stationary.
Similarly, one can compute the autocovariance and autocorrelation function (see Cowpertwait and Metcalfe for details) \[Cov(X_t, X_{t+k}) = t\sigma^2.\]
\[Cor(X_t, X_{t+k}) = \frac{1}{\sqrt{1+k/t}}\]
The autocorrelation is non-stationary.
When \(t\) is large compared to \(k\) we have very high correlation (close to \(1\))
We expect the correlogram of a reasonably long random walk to show very slow decay.
rnorm
) and the random walk is just generated by repeatedly adding noise terms:w <- rnorm(1000)
rw<-w
for(t in 2:1000) rw[t]<- rw[t-1] + w[t]
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(rw)
acf(rw, lag.max = 100)
If a non-stationary time series shows a stochastic trend we can try to study the time series of differences and see if that is stationary and easier to understand: \[\nabla X_t = X_{t} - X_{t-1}.\]
Since we assume \(X_0=0\) we get \[\nabla X_1 = X_1.\]
Specifically when we difference a random walk \(X_t = X_{t-1} + W_t\) we recover the white noise series \[\nabla X_t = W_t\]
diffrw <- diff(rw)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(diffrw)
acf(diffrw, lag.max = 30)
www <- "https://asta.math.aau.dk/eng/static/datasets?file=pounds_nz.dat"
exchange_data <- read.table(www, header = TRUE)[[1]]
exchange <- ts(exchange_data, start = 1991, freq = 4)
plot(exchange)
diffexchange <- diff(exchange)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
plot(diffexchange)
acf(diffexchange)
If the correlogram shows a significant auto-correlation at lag \(1\), it means that \(X_t\) and \(X_{t-1}\) are correlated.
The simplest way to model this, is the auto-regressive model of order one, AR(\(1\)): \[X_t = \alpha_1 X_{t-1} + W_t\] where \(W_t\) is white noise and the auto-regressive coefficient \(\alpha_1\) is a parameter to be estimated from data.
From last time: The model can only be stationary if \(-1<\alpha_1<1\) such that the dependence on the past decreases with time.
For a stationary AR(\(1\)) model with \(-1<\alpha_1<1\) we found
R has a built-in function arima.sim
to simulate AR(\(1\)) models (and other more complicated models called ARMA and ARIMA).
It needs the model (i.e. the autoregressive coefficient \(\alpha_1\)) and the desired number of time steps n
.
To simulate 200 time steps of AR(\(1\)) with \(\alpha_1=0.7\) we do:
There are several ways to estimate the parameters in an AR(\(1\)) process. We use the so-called maximum likelihood estimation method (MLE).
We skip the details of this, and simply use the function ar
:
The resulting object contains the value of the estimated parameter \(\hat\alpha_1\) and a bunch of other information.
In this example, the input data are artificial so we know that \(\hat\alpha_1\) should ideally be close to 0.7:
## [1] 0.7267829
fit$asy.var.coef
## [,1]
## [1,] 0.002340669
## [,1]
## [1,] 0.04838046
## [1] 0.6300219 0.8235438
The AR(\(1\)) model defined earlier has mean 0. However, we cannot expect data to fulfill this.
This is fixed by subtracting the average \(\bar{X}\) of the data before doing anything else, so the model that is fitted is actually: \[X_t-\bar{X} = \alpha_1 \cdot (X_{t-1} - \bar{X}) + W_t\]
The fitted model can be used to make predictions. To predict \(X_t\) from \(X_{t-1}\), we use that \(W_t\) is white noise, so we expect it to be zero on average: \[\hat{x}_t-\bar{x} = \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x}),\] so \[\hat{x}_t = \bar{x} + \hat{\alpha}_1 \cdot (x_{t-1} - \bar{x}), \quad t\ge2.\]
We can estimate the white noise terms as usual by the model residuals: \[\hat{w}_t = x_t - \hat{x}_t, \quad t\ge2.\]
If the model is correct, the residuals should look like a sample of white noise. We check this by plotting the acf:
## [1] 0.9437125
resexchange <- na.omit(fitexchange$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resexchange)
acf(resexchange)
This does not appear to really provide a better fit than the random walk model proposed earlier.
An alternative would be to propose an AR(\(1\)) model for the differenced time series \(\nabla X_t = X_t - X_{t-1}\):
## [1] 0.3495624
\[\hat{x}_{n+1} = \bar{x} + \hat{\alpha}_1 \cdot (x_{n} - \bar{x}).\]
This can be iterated to predict \(x_{n+2}\) \[\hat{x}_{n+2} = \bar{x} + \hat{\alpha}_1 \cdot (\hat{x}_{n+1} - \bar{x}),\] and we can continue this way.
Prediction is performed by predict
in R.
E.g. for the AR(\(1\)) model fitted to the exchange rate data, the last observation is in third quarter of 2000. If we want to predict \(1\) year ahead to third quarter of 2001:
## $pred
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 3.501528
## 2001 3.473716 3.447469 3.422699
##
## $se
## Qtr1 Qtr2 Qtr3 Qtr4
## 2000 0.1348262
## 2001 0.1853845 0.2208744 0.2482461
Note how the prediction returns both the predicted value and a standard error for this value.
We can use this to say that we are 95% confident that the exchange rate in third quarter of 2001 would be within \(3.42 \pm 2\cdot 0.25\).
We can plot a prediction and approximate 95% pointwise prediction intervals with ts.plot
(where we use a 10 year prediction – which is a very bad idea – to see how it behaves in the long run):
The first order auto-regressive model can be generalised to higher order by adding more lagged terms to explain the current value \(X_t\).
An AR(p) process is \[ X_t = \alpha_1 X_{t-1} + \alpha_2 X_{t-2} + \dots + \alpha_p X_{t-p} + W_t \] where \(W_t\) is white noise and \(\alpha_1, \alpha_2, \dots, \alpha_p\) are parameters to be estimated from data.
To simplify the notation, we introduce the backshift operator \(B\), that takes the time series one step back, i.e. \[ B X_t = X_{t-1} \]
This can be used repeatedly, for example \(B^2 X_t = B B X_t = B X_{t-1} = X_{t-2}\). We can then make a “polynomial” of backshift operators \[ \alpha(B) = 1-\alpha_1 B - \alpha_2 B^2 - \dots - \alpha_p B^p \] and write the AR(p) process as \[ W_t = \alpha(B) X_t. \]
Not all AR(p) models are stationary. To check that a given AR(p) model is stationary we consider the characteristic equation \[ 1 - \alpha_1 z - \alpha_2 z^2 - \dots - \alpha_p z^p = 0. \]
The characteristic equation can be written with the polynomial \(\alpha\) from previous slide, but with \(z\) inserted instead of \(B\), i.e. \[ \alpha(z) = 0. \]
The process is stationary if and only if all roots of the characteristic equation have an absolute value that is greater than \(1\).
Solving a \(p\)-order polynomial is hard for high values of \(p\), so we will let R do this for us.
asy.var.coef
in the fitted model object.##
## Call:
## ar(x = random, order.max = 5, method = "mle")
##
## Coefficients:
## 1 2 3 4 5
## 0.2047 0.0714 -0.0280 -0.1975 -0.2440
##
## Order selected 5 sigma^2 estimated as 0.0003103
resid <- na.omit(fit$resid)
par(mfrow = c(2,1), mar = c(4,4,1,1))
plot(resid)
acf(resid, lag.max = 30)
There are too many significant autocorrelations. Suggests the model is not a good fit.
We are not assured that the estimated model will be stationary. To check stationarity we solve \(\alpha(z)=0\):
## [1] 1.148425 1.415987 1.415987 1.148425 1.549525
A moving average process of order q, MA(q), is defined by \[X_t = W_t + \beta_1W_{t-1} + \beta_2W_{t-2} + \dots + \beta_qW_{t-q}\] where \(W_t\) is a white noise process with mean 0 and variance \(\sigma_w^2\) and \(\beta_1, \beta_2, \dots, \beta_q\) are parameters to be estimated.
The moving average process can also be written using the backshift operator: \[ X_t = \beta(B) W_t \] where the polynomial \(\beta(B)\) is given by \[ \beta(B) = 1 + \beta_1 B + \beta_2 B^2 + \dots + \beta_q B^q. \]
Since a moving average process is a finite sum of stationary white noise terms it is itself stationary with mean and variance:
The autocorrelation function is \[\rho(k) = \begin{cases} 1 & k=0\\ \sum_{i=0}^{q-k} \beta_i\beta_{i+k}/\sum_{i=0}^q \beta_i^2 & k=1,2,\dots,q\\ 0 & k>q\end{cases}\] where \(\beta_0=1\).
An MA(q) process can be simulated by first generating a white noise process \(W_t\) and then transforming it using the MA coefficients.
To simulate a model with \(\beta_1 = -0.7\), \(\beta_2 = 0.5\), and \(\beta_3 = -0.2\) we can use arima.sim
:
The theoretical autocorrelations are in this case: \[\rho(1) = \frac{1 \cdot (-0.7) + (-0.7) \cdot 0.5 + 0.5 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.65\] \[\rho(2) = \frac{1 \cdot 0.5 + (-0.7) \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = 0.36\] \[\rho(3) = \frac{1 \cdot (-0.2)}{1+(-0.7)^2+0.5^2+(-0.2)^2} = -0.11\] and \(\rho(k)=0\) for \(k>3\).
We plot the acf of the simulated series together with the theoretical one.
arima
:##
## Call:
## arima(x = xsim, order = c(0, 0, 3))
##
## Coefficients:
## ma1 ma2 ma3 intercept
## -0.8026 0.3898 -0.1843 0.0217
## s.e. 0.0737 0.0918 0.0790 0.0301
##
## sigma^2 estimated as 1.097: log likelihood = -293.42, aic = 596.84
arima
does not include automatic selection of the order of the model so this has to be chosen beforehand or selected by comparing several proposed models and choosing the model with the minimal AIC.## [1] 596.8406
A time series \(X_t\) follows an auto-regressive moving average (ARMA) process of order \((p,q)\), denoted \(ARMA(p,q)\), if \[X_t = \alpha_1 X_{t-1} + \alpha_2 X_{t-2} + \dots + \alpha_p X_{t-p} + W_t + \beta_1W_{t-1} + \beta_2W_{t-2} + \dots + \beta_qW_{t-q}\] where \(W_t\) is a white noise process and \(\alpha_1, \alpha_2, \dots, \alpha_p, \beta_1, \beta_2, \dots, \beta_q\) are parameters to be estimated.
We can simulate an ARMA model with arima.sim
. E.g. an ARMA(\(1,1\)) model with \(\alpha_1=-0.6\) and \(\beta_1=0.5\):
arima
as before.For the exchange rate data we may e.g. suggest either a AR(\(1\)), MA(\(1\)) or ARMA(\(1,1\)) model. We can compare fitted models using AIC
(smaller is better):
## [1] -37.40417
## [1] -3.526895
## [1] -42.27357
##
## Call:
## arima(x = exchange, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## 0.8925 0.5319 2.9597
## s.e. 0.0759 0.2021 0.2435
##
## sigma^2 estimated as 0.01505: log likelihood = 25.14, aic = -42.27