The standard simple linear regression model yi=α+βxi+ei,i=1,2,…,n assumes that the errors, the ei’s, are i.i.d. as ei∼N(0,σ2).
Here we discuss what happens if this assumption is violated. Namely, we consider what happens when there is correlation between the ei’s. This can happen in many situations, e.g. in time series.
Assume that the errors follow an auto regression ei=ϕei−1+ui,i=1,2,…,n where |ϕ|<1, e1=u1, and ui’s are i.i.d. N(0,σ2) for i=1,2,…,n.
Note that we here assume time dependence and ordering via the i subscript. Above, ei is an AR(1) process.
Take-home message: If we analyse data as a linear regression model, the parameter estimates are unbiased (they are ok, although we could do better). However, the standard errors of the estimates can be very misleading which can lead to overly confident conclusions.
Thus, instead of regressing yi on xi, we regress the yi−ϕyi−1 on xi−ϕxi−1, where we thus assume that we know ϕ. Note that we get one less observation.
Note also that the intercept has a factor of 1−ϕ, which must be taken into account.
If we do not know ϕ we can estimate it by first regressing yi on xi to obtain residuals r1,r2,…,rn, and the estimate ϕ by the correlation between (ri,ri−1) for i=2,3,…,n. With this estimate we can go through the steps above, so we end up with a two-stage procedure. (There are other alternatives to such a procedure, but we shall not go into detail here).
Assume that xi=i.
n <- 100
alpha <- 5
beta <- 1
sigma <- 2
phi <- 0.8
x <- 1:n
set.seed(1)
u <- rnorm(n, mean = 0, sd = sigma)
e <- numeric(n)
e[1] <- u[1]
for (i in 2:n) {
e[i] <- phi * e[i - 1] + u[i]
}
y <- alpha + beta*x + e
qplot(x, y) + geom_line()
m1 <- lm(y ~ x)
term | estimate | std.error |
---|---|---|
(Intercept) | 5.9 | 0.4880 |
x | 1.0 | 0.0084 |
Estimate ϕ and fit a model using the estimate (m2
):
r <- residuals(m1)
phi_est <- cor(tail(r, n - 1), head(r, n - 1))
phi_est
## [1] 0.6757454
z <- tail(y, n - 1) - phi_est*head(y, n - 1)
x_new <- tail(x, n - 1) - phi_est*head(x, n - 1)
m2 <- lm(z ~ x_new)
term | estimate | std.error |
---|---|---|
(Intercept) | 2 | 0.379 |
x_new | 1 | 0.019 |
Assume ϕ is known and fit a model using the estimate (m3
):
term | estimate | std.error |
---|---|---|
(Intercept) | 1.33 | 0.394 |
x_new | 0.99 | 0.032 |
We can do this many times (M=100) and inspect the estimate of ϕ and the coverage of the respective confidence intervals.
method | coverage |
---|---|
m1 (i.i.d.) | 0.41 |
m2 (est. phi) | 0.73 |
m3 (known phi) | 0.88 |
method | coverage |
---|---|
m1 (i.i.d.) | 0.44 |
m2 (est. phi) | 0.85 |
m3 (known phi) | 0.93 |