---
title: "Stochastic processes I"
author: "The ASTA team"
output:
  slidy_presentation:
    highlight: tango
    fig_caption: false
  pdf_document:
    keep_tex: yes
    toc: yes
    highlight: tango
    number_section: true
    fig_caption: false
---

# Concepts, terminology and examples
## Plotting, manipulation, trends and seasonality

- A discrete time stochastic process is a collection of the same variable
measured at different points in time. This is also know as a time series.
We will always assume the data is observed at equidistant points in time (i.e.
same time difference between consecutive observations).

- To get appropriate summaries and plots of the data we must tell R that the
data is a time series, which is called a `ts` object in R. A built-in example
is the dataset `AirPassengers` which we will abbreviate to `AP` to save typing:

```{r}
AP <- AirPassengers
plot(AP, ylab = "Bookings (1000s)")
```

- The generic function `plot` automatically detected that `AP` was a time series
(`ts` object) and plotted the correct time on the horizontal axis (by calling
`plot.ts` automatically).

----

- We can aggregate the time series over each year (sums the values by default, so
we add `FUN = mean` to get monthly mean), which clearly shows an increasing trend
in the yearly number of passenger bookings:
```{r}
APyear <- aggregate(AP, FUN = mean)
plot(APyear)
```

Such a systematic change in a time series that is not periodic is known as the
**trend** of the series.

----

- The repeating (periodic) pattern within each year seen in the original data is
known as **seasonality** or **seasonal variation**. The function `cycle` indicates
which cycle (aka season) each observation belongs to (in this case 1-12). A boxplot of
the number of bookings for each cycle (month) clearly shows the seasonality:
```{r}
cyc <- cycle(AP)
cyc <- factor(cyc, labels = month.abb)
boxplot(AP ~ cyc)
```

----

- If we only want to consider a subset of the data the function `window` is used:
```{r}
APJun55Dec56 <- window(AP, start = c(1955, 6), end = c(1956, 12))
APJun55Dec56
```

- The window function can also sub-sample the time series according to the frequency:
```{r}
APAug <- window(AP, start = c(1949, 8), end = end(AP), freq = TRUE)
```

- To plot multiple series in one plot we can use `ts.plot`:
```{r}
ts.plot(APyear, APAug, col = c("black", "red"))
legend("topleft", legend = c("Mean", "August"), col = c("black", "red"), lty = 1)
```

----

## Stochastic trend

- The trend for the passenger bookings data was very clear from the data (as well
as the seasonal pattern), and a reasonable explanation may be increasing population,
increasing prosperity and technological advances making tickets cheaper.

- In other situations the behavior of the trend may be less clear and we will
need to handle it differently. For the quarterly exchange rate between GBP and NZD
the picture is less clear:
```{r}
www <- "https://asta.math.aau.dk/eng/static/datasets?file=pounds_nz.dat"
exchange_data <- read.table(www, header = TRUE)
exchange <- ts(exchange_data, start = 1991, freq = 4)
plot(exchange)
```

- In general it is dangerous to extrapolate the trend, and this is even more so the
case for a series with a stochastic trend, which this example may have.

- Especially for financial time series stochastic trends with abrupt changes to
the trend are common.

----

## Multiple series

- Monthly time series from Jan. 1958 to Dec. 1990 of supply of three goods in Australia:
    - Electricity (Giga Watt hours)
    - Beer (Mega liters)
    - Chocolate (tonnes)
    
```{r}
CBEdata <- read.table("https://asta.math.aau.dk/eng/static/datasets?file=cbe.dat",
                      header = TRUE)
CBE <- ts(CBEdata, start = 1958, freq = 12)
plot(CBE)
```

- For overlapping time series **with the same frequency** we can get the data for
the overlapping period like this:
```{r}
APCBE <- ts.intersect(AP, CBE)
plot(APCBE)
```

----

- In the realm of time series people often find spurious correlations and mistake
them for causal relationships. E.g. the number of passenger bookings in USA looks
very similar to electricity consumption in Australia and there is a strong correlation:
```{r}
cor(APCBE)
```
However, this does not imply that US air passengers really influence the electricity
consumption in Australia (or vice versa)! Time series with similar trends and seasonality will
typically show strong correlation even though they are unrelated. Better explanations
may be due to similar population growth, seasonal patterns, etc.

----

## Decomposition
### Decomposition - trend term

- It can be useful to decompose the time series $x_t$ into *the trend* $m_t$,
*the seasonal variation* $s_t$, and *an error term* $z_t$. Here we will focus on
an additive decomposition $x_t = m_t + s_t + z_t$, but multiplicative models
$x_t = m_t \cdot s_t \cdot z_t$ are also available.

- To estimate the trend a simple solution is to use a moving average by averaging
the preceding and following $L$ values relative to the current point in time,
where $L$ is chosen to average out the season effect. So if the season effect is
weekly we choose $L=3$ such that
$$\hat{m}_t = (x_{t-3} + x_{t-2} + x_{t-1} + x_{t} + x_{t+1} + x_{t+2} + x_{t+3})/7$$
E.g. if the current time $t$ is a Tuesday the moving average averages today's value
with the ones from the preceding Sat., Sun. and Mon. and the following Wed., Thu. and Fri.
For an even season length like 12 months we have to do something else if we want an
equal amount of past and future in the moving average, and we use half of the value
six months in the past/future:
$$\hat{m}_t = \frac{\tfrac{1}{2}x_{t-6} + x_{t-5} + x_{t-4} + \dots + x_0 + \dots + x_{t+4} + x_{t+5} + \tfrac{1}{2}x_{t+6}}{12}$$
E.g., the moving average for Jul. 1958 uses half the value in Jan. 1958 and half
the value in Jan. 1959 and all the 11 values Feb.-Dec. 1958:
```{r}
choc <- CBE[, "choc"]
i <- 7
(choc[i-6]/2 + sum(choc[(i-5):(i+5)]) + choc[i+6]/2)/12
```
This computation can be done for all time points (except the first six and last six)
using `decompose`, which returns a list with a component `trend` (and other things
to be discussed shortly):
```{r}
choc_decomp <- decompose(choc)
choc_trend <- choc_decomp$trend
window(choc_trend, end = c(1958, 8))
```

- Note that the procedure automatically chooses the appropriate size of the moving
average based on the frequency of the time series, which was set to 12 in this case.

----

### Decomposition - seasonal term

- To estimate the seasonal effect we subtract the trend and average for each period
in the season (e.g. each month):
```{r}
choc_no_trend <- choc - choc_trend
plot(choc_no_trend)
choc_jan <- mean(window(choc_no_trend, st = c(1958, 1), freq = TRUE), na.rm = TRUE)
choc_jan
choc_may <- mean(window(choc_no_trend, st = c(1958, 5), freq = TRUE), na.rm = TRUE)
choc_may
```
It appears that chocolate consumption typically is much higher in May than in January.
The season effects are also calculated by `decompose` and stored in the list as
`seasonal`:
```{r}
plot(choc_decomp$seasonal)
```

----

### Decomposition - random term

- The entire decomposition can also be plotted:
```{r}
plot(choc_decomp)
```

- Here the random term is simply given by $z_t = x_t - m_t - s_t$, and this can
be accessed through the `random` entry of the decomposition output `choc_decomp$random`.


# Stationarity and autocorrelation

- Consider the excess chocolate consumption data after adjusting for trend and season
(and omitting the first six and last six entries which are `NA`):
```{r}
choc_rand <- na.omit(choc_decomp$random)
plot(choc_rand)
```

This is a single realization of something we may think of as a random experiment.
If things had been different (different weather, different chocolate advertisements, different economic fluctuations, ...)
we could imagine another series.

This hypothetical scenario applies to any time series $x_t$ where we imagine our
dataset is somehow randomly seleceted among an entire ensemble of possible time
series.

- Calculating the mean of all such hypothetical series would
give us the mean function $\mu(t) = E(x_t)$ for all $t$. Since we already adjusted
for trend and season we expect a mean value around zero at each time point, and
we might assume $\mu(t) = \mu$ for all $t$, where $\mu$ is a fixed constant
(zero in this case). If this is truly the case we say the series is stationary
in the mean (first order stationary). In general we can estimate $\mu$ by the
sample mean as we have been used to:
$$\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{t=1}^n x_t.$$

- The variance (square of std. dev.) is in general also a function of time
$\sigma^2(t) = E[(x_t-\mu(t))^2]$. If we assume we have stationarity of the variance
also $\sigma^2(t) = \sigma^2$ we can estimate $\sigma^2$ by the sample variance
$$\hat{\sigma^2} = \frac{1}{n-1} \sum_{t=1}^n (x_t-\bar{x})^2.$$

- Now consider a process that is stationary in the mean and variance. The variables
at different times may be correlated and the process is called second order stationary
if the covariance only depends on the number of time steps between the variables,
$Cov(x_t, x_{t+k}) = \gamma(k)$ for all $t$, and $\gamma$ is called the autocovariance function.
The number of time steps $k$ is called the lag. The autocovariance can be estimated from data:
$$\hat\gamma(k) = c_k = \frac{1}{n} \sum_{t=1}^{n-k} (x_t - \bar{x})(x_{t+k} - \bar{x})$$

- The more useful autocorrelation function (acf) is the normalized version that takes
values between -1 and +1: $\rho(k) = \gamma(k)/\sigma^2$, which is estimated based
on the estimate of the autocovariance above: $\hat\rho(k) = r_k = \frac{c_k}{c_0}$.

- The autocorrelation at lag 0 is always equal to 1, $\rho(0) = 1$ and $r_0 = 1$.

- For lag 1 it is the correlation between today's value $x_t$ and tomorrows value $x_{t+1}$.
If we look directly at the chocolate data this correlation is very strong (simply because there is an increasing trend, so high value is followed by a high value):
```{r}
i <- 1:(length(choc)-1)
plot(choc[i], choc[i+1])
abline(lsfit(choc[i], choc[i+1]))
cor(choc[i], choc[i+1])
```

- After removing the trend and seasonality there isn't really any dependence on
the preceding value:
```{r}
i <- 1:(length(choc_rand)-1)
plot(choc_rand[i], choc_rand[i+1])
abline(lsfit(choc_rand[i], choc_rand[i+1]))
cor(choc_rand[i], choc_rand[i+1])
```

----

## Correlogram (empirical acf)

- To calculate the empirical autocorrelation at many lags we use `acf` which
produces a plot by default:
```{r}
choc_acf <- acf(choc)
```

This is called the correlogram.

- It is possible to calculate the correlogram for any time series (also non-stationary).
However, the theoretical properties only make sense for a stationary process.

- The correlogram value at lag $k$, $r_k$ is an estimate of the true autocorrelation $\rho(k)$.

- In particular if $\rho(k)=0$ then $r_k$ will usually not be exactly zero. Rather it is
approximately normally distributed with a mean close to zero and standard deviation $1/\sqrt{n}$, where $n$ is the length of the series. Based on this approximate significance
bands are drawn at $\pm1.96/\sqrt{n}$.

- The significance bands are only valid for a single lag! Even if $\rho(k)=0$ for
all lags $k=1,2,\dots$ we expect 1/20 of the $r_k$'s to fall outside the line.
Furthermore the estimates are heavily correlated so if one falls outside it is
also more likely that the one next does.

- It is typical to look at a few specific lags. E.g. lag 1 for the immediate
past and lag 12 if a season of length 12 is expected. A spike at lag 12 indicates
that there is a seasonal pattern that has not been adequately accounted for.
It appears the seasonal adjustment for the chocolate data hasn't been perfect
since there is a somewhat large autocorrelation at lag 12 (1 year):
```{r}
acf(choc_rand)
```

----

## Typical patterns in correlograms

It can be illustrative to consider a few specific examples of correlograms:

- Completely random:
```{r}
x <- rnorm(100)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)
```

----

- Trend only:
```{r}
x <- 1:100
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)
```

----

- Sinusoidal wave:
```{r}
x <- sin((1:100)*2*pi/10)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 100)
```

----

- Repeated pattern:
```{r}
x0 <- c(3,5,2,1,4)
x <- rep(x0, 20)
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
acf(x, main = "", lag.max = 30)
```

----

## Partial autocorrelation function

- If a time series has a strong correlation with the values lagged by one time step
it most likely also has a strong correlation with the values lagged by two time steps:
When today's value is highly influenced by yesterdays, then yesterdays will be highly
influenced by the day before, and thus the lag 2 value influences the current value
through the lag 1 value.

- The partial autocorrelation (pacf) at lag 2 is the correlation between the time
series and the lag 2 values **after controlling for the lag 1 values**.

- This is analogous to the multiple regression setup
$$x_t = \beta_0 + \beta_1 x_{t-1} + \beta_2 x_{t-2} + \epsilon$$
where $\beta_2$ is a partial effect of the corresponding predictor ($x_{t-2}$)
**when controlling for the other predictor** ($x_{t-1}$).

- In general the pacf at lag $k$ is the correlation of $x_t$ and $x_{t-k}$ after
controlling for the intermediate variables $x_{t-1}, \dots, x_{t-k-1}$.

- A more detailed description of partial correlation in the multiple regression
setup is in Section 11.7 of Agresti (2013).

- For a strong deterministic trend the pacf only shows strong correlation in the
first lag and after controlling for this the following lags show no extra correlation:
```{r}
x <- 1:100
par(mfrow = c(2,1), mar = c(4,4,0.5,0.5))
ts.plot(x)
pacf(x, main = "")
```