R in Time Series: Linear Regression With Harmonic Seasonality

This tutorial talks about linear regression with harmonic seasonality.

1  Underlying mathematics

In regression modeling with seasonality, we can use one parameter for each season. For instance, 12 parameters for 12 months in one year. However, seasonal effects often vary smoothly over the seasons, so that it may be more parameter-efficient to use a smooth function instead of separate indices. Sine and cosine functions can be used to build smooth variationinto a seasonal model.

A sine wave can be expressed as

    \[A\sin(2\pi\cdot f\cdot t+\phi)=\alpha_s\sin(2\pi\cdot f\cdot t)+\alpha_c\cos(2\pi\cdot f\cdot t)\]

with f , A, and \phi\, being frequency (number of cycles per unit time), amplitude, and phase shift, respectively.

Sin functions of different frequencies

Note that the mixing effects of A and \phi\, are combined into two parameters \alpha_s=A\cos(\phi)\, and \alpha_c=A\sin(\phi). For this sense, the right hand side of above equation is linear in the parameters \alpha_s and \alpha_c, while the left hand side is non-linear because \phi\, is embeded inside the sine function. Therefore, the expression on R.H.S. is preferred in the formulation of a seasonal regression model so that OLS can be used to estimate the parameters.

For a time series \{x_t\}\, with s seasons, there are [s/2] possibly different types of cycles combined together. Here [  ] represents the integer part of the expression within it. In most practical cases, s is even and [ ] can be omitted. For example, if the seasons are the months in a year, then s=12 and there are 6 cycles of different frequency. However, for some ‘seasons’ s may be an odd numbers. For example, if the seasons are the days of the week, then they might be [7/2]=3 cycles.

The harmonic seasonal model can be defined by

    \[x_t=m_t+\sum_{i=1}^{[s/2]}\left\{S_i\sin(2\pi\cdot f_i\cdot t)+C_i\cos(2\pi\cdot f_i\cdot t)\right\}+z_t\]

where

    \[f_i=i\cdot f,\qquad i=1,2,\cdots,[s/2]\]

and f\, is the basic frequency.

For the case that the seasons are the 12 months in a year:

  • f=1\, if the time unit is year
  • f=1/12\, if the time unit is month.

It makes sense that if there is 1 cycle per year, then 1/12 cycle per month.

Notes:

  • m_t\, is the trend which includes a parameter for the constant term.
  • S_i\, and C_i,\,i=1,\cdots,[s/2],\, are unknown parameters.
  • The trend might take a polynomial form of linear model
  • At first sight it m ay seem strange that the harmonic model has cycles of a frequency higher that the seasonal frequency of 1/s. However, the addition of further harmonics has the effect of perturbing the underlying wave to make it less regular than a standard sine wave of period s. This usually still gives a dorminant seasonal pattern of period s, but with a more realistic underlying shape.
    For example, below code simulates data measured in one year, and the time unit is month

    The code above illustrates just one of many possible combinations of harmonics that could be used to model a wide range of possible underlying seasonal patterns. Note that the frequencies are i/12 here, since the time unit is month.
    Plots of single and stacked trigonometric functions

 

2  An example of simulating seasonality using harmonic model

Suppose the underlying model is

    \[x_t=0.1+0.005t+0.001t^2+\sin(2\pi t/12)+0.2\sin(4\pi t/12)\\+0.1\sin(8\pi t/12)+0.1\cos(8\pi t/12)+w_t\]

where \{w_t\}\, is Gaussian white noise with standard deviation 0.5. Using the code below, a series of 10 years is simulated, with time unit being month

A simulated 10-year series using harmonic functions

Note that the divisor of 12 is used since the time variable is in years.

Now we fit this simulated series using linear model with seasonality using harmonic functions.

Notes:

  • The I() function is used to make sure TIME^2 is treated ‘as is’.
  • In most cases, the order of the harmonics and polynomial trend will be unknown. However, the harmonic coefficients are known to be independent, which means that all harmonic coefficients that are not statistically significant can be dropped.
  • It is largely a subjective decision on the part of the statistician to decide what constitutes a significant variable. An approximate t-ratio of magnitude 2 is a common choice and corresponds to an approximate 5% significance level.
  • The t-ratio can be obtained by dividing the estimated coefficient by the standard error of the estimate. So, in above example, there exist 3 significant coefficients for I(TIME^2), SIN[,1], and SIN[,2].

Through above procedure, we determined three regression variables, and they are used in following model

As can be seen, the coefficients of the selected three predictor variables are all significant. From the output of the parameter estimates, the following model can be used to do prediction at time t

    \[\hat{x}_t=0.28+0.0104t^2+0.9\sin(2\pi t/12)+0.199\sin(4\pi t/12)\]

The order of the regression can be chosen using the Akaike Information Criterion (AIC), and the model with the smallest AIC shall be the best fittingmodel.

As can be seen, the second model is better in fitting the data.

Due to sampling variation, the best-fitting model is not identical to the model used to simulate the data, as can be verified by taking the AIC of the known underlying model:

In R, the algorithm step() can be used to automate the selection of the best-fitting model by AIC.

Here the command  step(x.lm1) tests all predictor variables contained in the regression model of x.lm1. As can be seen, the selection of predictor variables with minimum AIC agrees with what we obtained before.

Note that a best fit can equally well be based on choosing the model that leads to the smallest estimated standard deviations of the errors, provided that the degree of freedom are taken into account.

 

3  Harmonic model fitted to a real global temperature series

In the code below, a harmonic model with a quadratic trend is fitted to the global temperature series (1997~2005).

  • The unit for the time variable is year, so there is no need to use the divisor of 12 when creating the harmonic variables.
  • The TIME variable is standardized to reduce the computation error in the OLS procedure due to large numbers.

Based on above result, the predictors with statistical significance, TIME, SIN[,1] and SIN[,2], are chosen as predictor variables to do regression again, and the AIC is used to compare the fitted model.

To check the adequacy of the fitted model, it is appropriate to create a time plot and correlogram of the residual series

Analysies on the residuals of the global temperature time series

In figure (a), there is no discernible curve in the series, which implies that a straight line is an adequate description of the trend. A tendency for the series to persist above or below the x-axis implies that the series is positively autocorrelated. This is verified in figure (b), the correlogram of the residuals, which shows a clear positive autocorrelation at lags 1-10.

The correlogram in figure (b) shows similar pattern as that of AR(p) processes. This is verified by the plot of partial autocorrelations, as shown in figure (c), that only the lag 1 and 2 autocorrelations are statistically significant. In following code, an AR(2) model is fitted to the residual series:

ACF of the temperature residual after applying AR(2) model

The correlogram of the residuals of the fitted AR(2) model clearly shows that the residuals are approximately white noise. Hence, the final form of the model provides a good fit to the data. The fitted model for the temperature series, combining straight line trend, harmonic seasonality, and AR(2), can be written as

    \[x_t=0.175+\frac{0.184(t-1988)}{10.4}+0.0204\sin(2\pi t)+0.0162\sin(4\pi t)+z_t\]

where t is ‘time’ measured in the unit of ‘year’, and the residual series \{z_t\}\, follow an AR(2) process given by

    \[z_t=0.494z_{t-1}+0.307z_{t-2}+w_t\]

where \{w_t\}\, is white noise with mean zero and standard deviation 0.0837.

If we require an accurate assessment of the standard error, we should re-fit the model using gls() function, allowing for AR(2) structure for the errors.

 

 

 

 

 

One comment on “R in Time Series: Linear Regression With Harmonic Seasonality

  1. Thank you for this really helpful tutorial on harmonic linear regression. I just wondered in R how you would code the final model with the AR(2) component, please could you provide the final code?

Leave a Reply

Your email address will not be published. Required fields are marked *