Concepts, terminology and examples

Plotting, manipulation, trends and seasonality

A discrete time stochastic process is a collection of the same variable measured at different points in time. This is also know as a time series. We will always assume the data is observed at equidistant points in time (i.e. same time difference between consecutive observations).
To get appropriate summaries and plots of the data we must tell R that the data is a time series, which is called a ts object in R. A built-in example is the dataset AirPassengers which we will abbreviate to AP to save typing:

AP <- AirPassengers
plot(AP, ylab = "Bookings (1000s)")

The generic function plot automatically detected that AP was a time series (ts object) and plotted the correct time on the horizontal axis (by calling plot.ts automatically).

We can aggregate the time series over each year (sums the values by default, so we add FUN = mean to get monthly mean), which clearly shows an increasing trend in the yearly number of passenger bookings:

APyear <- aggregate(AP, FUN = mean)
plot(APyear)

Such a systematic change in a time series that is not periodic is known as the trend of the series.

The repeating (periodic) pattern within each year seen in the original data is known as seasonality or seasonal variation. The function cycle indicates which cycle (aka season) each observation belongs to (in this case 1-12). A boxplot of the number of bookings for each cycle (month) clearly shows the seasonality:

cyc <- cycle(AP)
cyc <- factor(cyc, labels = month.abb)
boxplot(AP ~ cyc)

If we only want to consider a subset of the data the function window is used:

APJun55Dec56 <- window(AP, start = c(1955, 6), end = c(1956, 12))
APJun55Dec56

##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1955                     315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306

The window function can also sub-sample the time series according to the frequency:

APAug <- window(AP, start = c(1949, 8), end = end(AP), freq = TRUE)

To plot multiple series in one plot we can use ts.plot:

ts.plot(APyear, APAug, col = c("black", "red"))
legend("topleft", legend = c("Mean", "August"), col = c("black", "red"), lty = 1)

Stochastic trend

The trend for the passenger bookings data was very clear from the data (as well as the seasonal pattern), and a reasonable explanation may be increasing population, increasing prosperity and technological advances making tickets cheaper.
In other situations the behavior of the trend may be less clear and we will need to handle it differently. For the quarterly exchange rate between GBP and NZD the picture is less clear:

www <- "https://asta.math.aau.dk/eng/static/datasets?file=pounds_nz.dat"
exchange_data <- read.table(www, header = TRUE)
exchange <- ts(exchange_data, start = 1991, freq = 4)
plot(exchange)

In general it is dangerous to extrapolate the trend, and this is even more so the case for a series with a stochastic trend, which this example may have.
Especially for financial time series stochastic trends with abrupt changes to the trend are common.

Multiple series

Monthly time series from Jan. 1958 to Dec. 1990 of supply of three goods in Australia:
- Electricity (Giga Watt hours)
- Beer (Mega liters)
- Chocolate (tonnes)

CBEdata <- read.table("https://asta.math.aau.dk/eng/static/datasets?file=cbe.dat",
                      header = TRUE)
CBE <- ts(CBEdata, start = 1958, freq = 12)
plot(CBE)

For overlapping time series with the same frequency we can get the data for the overlapping period like this:

APCBE <- ts.intersect(AP, CBE)
plot(APCBE)

In the realm of time series people often find spurious correlations and mistake them for causal relationships. E.g. the number of passenger bookings in USA looks very similar to electricity consumption in Australia and there is a strong correlation:

cor(APCBE)

##                  AP   CBE.choc   CBE.beer   CBE.elec
## AP        1.0000000  0.6375808 -0.4434747  0.8841668
## CBE.choc  0.6375808  1.0000000 -0.6204467  0.7644756
## CBE.beer -0.4434747 -0.6204467  1.0000000 -0.3120179
## CBE.elec  0.8841668  0.7644756 -0.3120179  1.0000000

However, this does not imply that US air passengers really influence the electricity consumption in Australia (or vice versa)! Time series with similar trends and seasonality will typically show strong correlation even though they are unrelated. Better explanations may be due to similar population growth, seasonal patterns, etc.

Decomposition

Decomposition - trend term

It can be useful to decompose the time series $x_t$ into the trend $m_t$, the seasonal variation $s_t$, and an error term $z_t$. Here we will focus on an additive decomposition $x_t = m_t + s_t + z_t$, but multiplicative models $x_t = m_t \cdot s_t \cdot z_t$ are also available.
To estimate the trend a simple solution is to use a moving average by averaging the preceding and following $L$ values relative to the current point in time, where $L$ is chosen to average out the season effect. So if the season effect is weekly we choose $L=3$ such that \[\hat{m}_t = (x_{t-3} + x_{t-2} + x_{t-1} + x_{t} + x_{t+1} + x_{t+2} + x_{t+3})/7\] E.g. if the current time $t$ is a Tuesday the moving average averages today’s value with the ones from the preceding Sat., Sun. and Mon. and the following Wed., Thu. and Fri. For an even season length like 12 months we have to do something else if we want an equal amount of past and future in the moving average, and we use half of the value six months in the past/future: \[\hat{m}_t = \frac{\tfrac{1}{2}x_{t-6} + x_{t-5} + x_{t-4} + \dots + x_0 + \dots + x_{t+4} + x_{t+5} + \tfrac{1}{2}x_{t+6}}{12}\] E.g., the moving average for Jul. 1958 uses half the value in Jan. 1958 and half the value in Jan. 1959 and all the 11 values Feb.-Dec. 1958:

choc <- CBE[, "choc"]
i <- 7
(choc[i-6]/2 + sum(choc[(i-5):(i+5)]) + choc[i+6]/2)/12

## [1] 2413.625

This computation can be done for all time points (except the first six and last six) using decompose, which returns a list with a component trend (and other things to be discussed shortly):

choc_decomp <- decompose(choc)
choc_trend <- choc_decomp$trend
window(choc_trend, end = c(1958, 8))

##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 1958       NA       NA       NA       NA       NA       NA 2413.625
##           Aug
## 1958 2409.500

Note that the procedure automatically chooses the appropriate size of the moving average based on the frequency of the time series, which was set to 12 in this case.

Decomposition - seasonal term

To estimate the seasonal effect we subtract the trend and average for each period in the season (e.g. each month):

choc_no_trend <- choc - choc_trend
plot(choc_no_trend)

choc_jan <- mean(window(choc_no_trend, st = c(1958, 1), freq = TRUE), na.rm = TRUE)
choc_jan

## [1] -2450.538

choc_may <- mean(window(choc_no_trend, st = c(1958, 5), freq = TRUE), na.rm = TRUE)
choc_may

## [1] 1171.215

It appears that chocolate consumption typically is much higher in May than in January. The season effects are also calculated by decompose and stored in the list as seasonal:

plot(choc_decomp$seasonal)

Decomposition - random term

The entire decomposition can also be plotted:

plot(choc_decomp)

Here the random term is simply given by $z_t = x_t - m_t - s_t$, and this can be accessed through the random entry of the decomposition output choc_decomp$random.

Stationarity and autocorrelation

Consider the excess chocolate consumption data after adjusting for trend and season (and omitting the first six and last six entries which are NA):

choc_rand <- na.omit(choc_decomp$random)
plot(choc_rand)

This is a single realization of something we may think of as a random experiment. If things had been different (different weather, different chocolate advertisements, different economic fluctuations, …) we could imagine another series.

This hypothetical scenario applies to any time series $x_t$ where we imagine our dataset is somehow randomly seleceted among an entire ensemble of possible time series.

Calculating the mean of all such hypothetical series would give us the mean function $\mu(t) = E(x_t)$ for all $t$. Since we already adjusted for trend and season we expect a mean value around zero at each time point, and we might assume $\mu(t) = \mu$ for all $t$, where $\mu$ is a fixed constant (zero in this case). If this is truly the case we say the series is stationary in the mean (first order stationary). In general we can estimate $\mu$ by the sample mean as we have been used to: \[\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{t=1}^n x_t.\]
The variance (square of std. dev.) is in general also a function of time $\sigma^2(t) = E[(x_t-\mu(t))^2]$. If we assume we have stationarity of the variance also $\sigma^2(t) = \sigma^2$ we can estimate $\sigma^2$ by the sample variance \[\hat{\sigma^2} = \frac{1}{n-1} \sum_{t=1}^n (x_t-\bar{x})^2.\]
Now consider a process that is stationary in the mean and variance. The variables at different times may be correlated and the process is called second order stationary if the covariance only depends on the number of time steps between the variables, $Cov(x_t, x_{t+k}) = \gamma(k)$ for all $t$, and $\gamma$ is called the autocovariance function. The number of time steps $k$ is called the lag. The autocovariance can be estimated from data: \[\hat\gamma(k) = c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x})\]
The more useful autocorrelation function (acf) is the normalized version that takes values between -1 and +1: $\rho(k) = \gamma(k)/\sigma^2$, which is estimated based on the estimate of the autocovariance above: $\hat\rho(k) = r_k = \frac{c_k}{c_0}$.
The autocorrelation at lag 0 is always equal to 1, $\rho(0) = 1$ and $r_0 = 1$.
For lag 1 it is the correlation between today’s value $x_t$ and tomorrows value $x_{t+1}$. If we look directly at the chocolate data this correlation is very strong (simply because there is an increasing trend, so high value is followed by a high value):

i <- 1:(length(choc)-1)
plot(choc[i], choc[i+1])
abline(lsfit(choc[i], choc[i+1]))

cor(choc[i], choc[i+1])

## [1] 0.8025919

After removing the trend and seasonality there isn’t really any dependence on the preceding value:

i <- 1:(length(choc_rand)-1)
plot(choc_rand[i], choc_rand[i+1])
abline(lsfit(choc_rand[i], choc_rand[i+1]))

cor(choc_rand[i], choc_rand[i+1])

## [1] -0.02278652

Correlogram (empirical acf)

To calculate the empirical autocorrelation at many lags we use acf which produces a plot by default:

choc_acf <- acf(choc)

This is called the correlogram.

It is possible to calculate the correlogram for any time series (also non-stationary). However, the theoretical properties only make sense for a stationary process.
The correlogram value at lag $k$, $r_k$ is an estimate of the true autocorrelation $\rho(k)$.
In particular if $\rho(k)=0$ then $r_k$ will usually not be exactly zero. Rather it is approximately normally distributed with a mean close to zero and standard deviation $1/\sqrt{n}$, where $n$ is the length of the series. Based on this approximate significance bands are drawn at $\pm1.96/\sqrt{n}$.
The significance bands are only valid for a single lag! Even if $\rho(k)=0$ for all lags $k=1,2,\dots$ we expect 1/20 of the $r_k$’s to fall outside the line. Furthermore the estimates are heavily correlated so if one falls outside it is also more likely that the one next does.
It is typical to look at a few specific lags. E.g. lag 1 for the immediate past and lag 12 if a season of length 12 is expected. A spike at lag 12 indicates that there is a seasonal pattern that has not been adequately accounted for. It appears the seasonal adjustment for the chocolate data hasn’t been perfect since there is a somewhat large autocorrelation at lag 12 (1 year):

acf(choc_rand)

Typical patterns in correlograms

It can be illustrative to consider a few specific examples of correlograms:

Completely random:

x <- rnorm(100)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)

Trend only:

x <- 1:100
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)

Sinusoidal wave:

x <- sin((1:100)*2*pi/10)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 100)

Repeated pattern:

x0 <- c(3,5,2,1,4)
x <- rep(x0, 20)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)

Partial autocorrelation function

If a time series has a strong correlation with the values lagged by one time step it most likely also has a strong correlation with the values lagged by two time steps: When today’s value is highly influenced by yesterdays, then yesterdays will be highly influenced by the day before, and thus the lag 2 value influences the current value through the lag 1 value.
The partial autocorrelation (pacf) at lag 2 is the correlation between the time series and the lag 2 values after controlling for the lag 1 values.
This is analogous to the multiple regression setup \[x_t = \beta_0 + \beta_1 x_{t-1} + \beta_2 x_{t-2} + \epsilon\] where $\beta_2$ is a partial effect of the corresponding predictor ($x_{t-2}$) when controlling for the other predictor ($x_{t-1}$).
In general the pacf at lag $k$ is the correlation of $x_t$ and $x_{t-k}$ after controlling for the intermediate variables $x_{t-1}, \dots, x_{t-k-1}$.
A more detailed description of partial correlation in the multiple regression setup is in Section 11.7 of Agresti (2013).
For a strong deterministic trend the pacf only shows strong correlation in the first lag and after controlling for this the following lags show no extra correlation:

x <- 1:100
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
pacf(x, main = "")

Stochastic processes I