The ASTA team
A special type of data arises when we measure the same variable at different points in time with equal steps between time points.
This data type is called a (discrete time) stochastic process or a time series
One example is the time series of monthly electricity production (GWh) in Australia from Jan. 1958 to Dec. 1990 :
CBEdata <- read.table("https://asta.math.aau.dk/eng/static/datasets?file=cbe.dat", header = TRUE)
CBE <- ts(CBEdata[,3])
plot(CBE, ylab="GWh",main="Electricity production")
We denote by \(X_t\) the variable at time \(t\). We denote the time points by \(t=1,2,3,\ldots,n\).
Measurements that are close in time will typically be similar: observations are not statistically independent!
Measurements that are far apart in time will typically be less correlated.
A stochastic process is called a white noise process if the \(X_t\) are
It is called Gaussian white noise, if
White noise processes are the simplest stochastic processes.
Real data does typically not have complete independence between measurements at different time points, so white noise is generally not a good model for real data, but it is a building block for more complicated stochastic processes.
A random walk is defined by \(X_t = X_{t-1} + W_t\), where \(W_t\) is white noise.
Here are 5 simulations of a random walk:
w = ts(rnorm(1000))
x1 = filter(w,0.5,method="recursive")
x2 = filter(w,0.9,method="recursive")
x3 = filter(w,0.99,method="recursive")
ts.plot(x1,x2,x3,col=1:3)
The mean function of a stochastic process is given by \[ \mu_t = \mathbb{E}(X_t) \]
A process is called first order stationary if \(\mu_t = \mu\).
Examples:
Consider an AR(\(1\)) process \(X_t = \alpha X_{t-1} + W_t\). We consider stationarity and autocorrelation for this process.
Now consider the variance. Since \(X_t = \alpha X_{t-1} + W_t\), \[ \sigma_t^2=\text{Var}(X_t) =\text{Var}(\alpha X_{t-1} + W_t) =\text{Var}(\alpha X_{t-1}) + \text{Var}(W_t)\\ = \alpha^2 \text{Var}( X_{t-1}) + \text{Var}(W_t) = \alpha^2 \sigma_{t-1}^2 + \sigma^2 \]
If the variance is constant, then \(\sigma_t^2 = \sigma^2_{t-1}\) and \[ \sigma^2_t = \alpha^2 \sigma_t^2 + \sigma^2 \]
We see that the variance can only be constant if \(-1<\alpha <1\). In this case \(\sigma_t^2 = \frac{\sigma^2}{1-\alpha^2}\).
For \(|\alpha| \geq 1\), the variance will increase over time. The process is cannot be stationary (including random walk).
To find the autocorrelation, first observe \[ X_{t+h} = \alpha X_{t+h-1} + W_{t+h} = \cdots = \alpha^h X_t+ \sum_{i=0}^{h-1} \alpha^i W_{t+h-i} \]
Then we find the autocovariance: \[ \gamma(t,t+h) = \text{Cov}(X_t, X_{t+h}) = \text{Cov}(X_t, \alpha^h X_t + \sum_{i=0}^{h-1} \alpha^i W_{t+h-i})\\ = \text{Cov}(X_t, \alpha^h X_t) + \text{Cov}(X_t, \sum_{i=0}^{h-1} \alpha^i W_{t+h-i}) = \alpha^h\text{Cov}(X_t,X_t) + 0 = \alpha^h \text{Var}(X_t) \]
If the variance is constant, we can calculate the autocorrelation: \[ \frac{\text{Cov}(X_t, X_{t+h})}{\sigma_t\sigma_{t+h}} = \frac{\alpha^h\sigma^2/(1-\alpha^2)}{\sigma^2/(1-\alpha^2)} = \alpha^h. \]
So: the AR(\(1\))-model is stationary if \(-1<\alpha <1\) and \(\sigma_t^2 = \sigma^2/(1-\alpha^2)\) - otherwise not.
The autocorrelation decays exponentially for a stationary AR(\(1\))-model. This is illustrated for 3 different \(\alpha\) values:
White noise:
The correlogram is always \(1\) at lag \(0\)
For white noise, the true autocorrelation drops to zero.
The estimated autocorrelation is never exactly zero - hence we get the small bars.
The blue lines is a 95% confidence band for a test that the true autocorrelation is zero.
Remember that there is 5% chance of rejecting a true null hypothesis. Thus, 5% of the bars can be expected to exeed the blue lines.
AR(\(1\)) process with \(\alpha=0.9\):
We will primarily look at stationary processes the next time, but these will not always be good models for data.
First we need to check whether the assumption of stationarity is okay.
Note: even though \(\rho(h)\) is only well-defined for stationary models, we can plug any data (stationary or not) into the estimation formula. The estimate may help detecting deviations from stationarity.
The periodic mean of the process results in a periodic behavior of the correlogram.
A periodic behavior in the correlogram suggests seasonal behavior in the process.
Straight line with added white noise:
The linear trend results in a slowly decaying, almost linear correlogram.
Such a correlogram suggests a trend in the data.
Data example: Electricity production.
There seems to be an increasing trend in the data.
There is a periodic behavior around the increasing trend.
It is reasonable to believe that the period is 12 months.
We have the model \[ X_t = m_t + s_t + Z_t \] where
The trend \(m_t\) in the data can be estimated by a moving average.
In the case of monthly variation, \[\hat{m}_t = \frac{\frac{1}{2}x_{t-6}+ x_{t-5} + \dots + x_t + \dots +x_{t+5} + \frac{1}{2}x_{t+6}}{12}\]
We remove the trend by considering \(x_t-\hat{m}_t\).
Next we find the seasonal term \(s_t\) by averaging \(x_t-\hat{m}_t\) over all measurements in the given month.
We are left with the random part \(\hat{z}_t = x_t - \hat{m}_t - \hat{s}_t\).
For the Australian electricity data: