R in Time Series: Holt-Winters Smoothing and Forecast

This tutorial tells about how to do Holt-WInters smoothing and forecast in R.

1  Basics of Holt-Winters method

1.1  Additive model

\[\text{Level:    }a_t=\alpha(x_t-s_{t-p})+(1-\alpha)(a_{t-1}+b_{t-1})\]

\[\text{Trend (or slope):    }b_t=\beta(a_t-a_{t-1})+(1-\beta)b_{t-1}\]

\[\text{Seasonal effect:    }s_t=\gamma(x_t-a_t)+(1-\gamma)s_{t-p}\]

where \(a_t\), \(b_t\), and \(s_t\,\) are the estimated level, slope, and seasonal effect at time t, and \(\alpha\), \(\beta\), and \(\gamma\,\) are the smoothing parameters.

Notes:

  • The level updating equation takes a weighted average of our latest observation with our existing estimate of the appropriate seasonal effect subtracted, and our forecast of the level made one time step ago. Here the one-step-ahead forecast of the level is the sum of the estimates of the level and slope at the time of forecast. Note that the mean of the process is the sum of the level and the appropriate seasonal effect.
  • The trend (or slope) updating equation takes a weighted average of the latest estimate of slope, i.e., the difference in the estimated level at time t and the estimated level at time t-1, and our previous estimate of slope. Note that R’s convention refers to the slope as the trend.
  • The seasonal effect updating equation takes a weighted average of the difference between the current observation and current estimate of the level, and the last estimate of the seasonal effect for this season which was made at time \(t-p\).
  • The trend and seasonal effect updating equations can only be used after the level updating equation has been applied to get \(a_t\).
  • Typical choices of smoothing weights are \(\alpha=0.2,\,\beta=0.2,\,\text{and}\,\gamma=0.2\).
  • The updating equations can be started with \(a_1=x_1\), and the initial slop \(b_1\,\) and seasonal effects \(s_1,\cdots,s_p\) reckoned from experience, estimated from the data in some way, or set at zero. The default in R is to use values obtained from the decompose procedure.

Given observations up to time n, \(x_1,\cdots,x_n\), the forecasting equation for \(x_{n+k}\,\) is

\[\hat{x}_{n+k|n}=a_n+k\cdot b_n+s_{n+k-p}\qquad\qquad k\le p\]

where \(a_n\,\) is the estimate level and \(b_n\,\) is the estimated slope, so \(a_n+kb_n\,\) is the expected level at time n+k, and \(s_{n+k-p}\,\) is the exponentially weighted estimate of the seasonal effect made at time \(n+k-p\).

For example, for monthly data (p=12), if time n+1 occurs in January, then \(s_{n+1-12}\,\) is the exponentially weighted estimate of the seasonal effect for January made in the previous year. The forecasting equation can be used for lead times between (m-1)p+1 and mp, but then the most recent exponentially weighted estimate of the seasonal effect available will be \(s_{n+k-(m-1)p}\).

1.2  Multiplicative model

\[\text{Level:}\quad a_n=\alpha\left(\dfrac{x_n}{s_{n-p}}\right)+(1-\alpha)(a_{n-1}+b_{n-1})\]

\[\text{Trend (or slope):}\quad b_n=\beta(a_n-a_{n-1})+(1-\beta)b_{n-1}\]

\[\text{Seasonal effect:}\quad s_n=\gamma\left(\dfrac{x_n}{a_n}\right)+(1-\gamma)s_{n-p}\]

Given observations up to time n, \(x_1,\cdots,x_n\), the forecasting equation for \(x_{n+k}\,\) is

\[\hat{x}_{n+k|n}=(a_n+k\cdot b_n)\cdot s_{n+k-p}\qquad\qquad k\le p\]

In R, the function HoltWinters() can be used to estimate smoothing parameters for the Holt-Winters model by minimizing the one-step-ahead prediction errors (SS1PE).

 

2  Simple Exponential Smoothing

If you have a time series that can be described using an additive model with constant level and no seasonaility, you can use simple exponential smoothing to make short-term forecast.

SMoothing is controlled by the parameter \(\alpha\in[0,1]\), where a value close to zero means that little weight is placed on the most recent observations.

Time series of London rainfall

As can be seen, there is roughly constant (mean) level of 25 inches in rainfall, and the random fluctuation seems to be roughly constant in size over time. Therefore, it is probably appropriate to describe the data using an additive model, and we can make forcasts using simple exponential smoothing.

To use R’s HoltWinters() function for simple exponential smoothing, we need to set the parameters beta=FALSE and gamma=FALSE.

By default, HoltWInters() just make forecasts for the time period covered by the original time series

We can plot the original time series against the forcasts as following

Simple smoothing forecast for London rainfall As a measure of the forecast accuracy, we can calculate the sum of squared error (SSE) for the in-sample forecast errors

The higher the value of \(\alpha\), the less smoothing, as shown below

Control smoothing levels by  alpha values

We can use the ‘l.start’ parameter in the HoltWInters() function to set the initial value for the level. For example

Set initial value for the level in simple smoothing forecast

Note, if \(\alpha\,\) is not specified, the \(\alpha\,\) value calculated by R will be different for different specified initial values.

By default, HoltWInters() function makes forecasts for the time period covered by the original time series. However, we can use the forecast.HoltWinters() functio n in the ‘forecast‘ package to make forecasts for further time points in future.

Forecast beyond original period

Note, the 80% and 95% prediction intervals shown here are based on assumptions that there are no autocorrelations in the forecast errors, and the forecast errors are normally distributed with mean zero and constant variance.

‘Forecast error’ is defined as the observed value minus predicted value for each time point. We can calculate the forecast error only for the time period covered by the original data. The accuracy of prediction is represented by the sum of squared errors (SSE) of the in-sample forecast error. The in-sample errors can be obtained as following

If the predictive model cannot be improved upon, there should be no correlations between forecast errors for successive predictions. In other words, if there are correlations between forecast errors for successive predictions, it is likely that the simple exponential smoothing forecast could be improved upon by another forecasting techniques.

ACF of forecast errors in simple exponential smoothing

 

As can be seen, the correlation at lag 3 is just touching the significance bounds. To test whether there is significant evidence of non-zero correlation at lags 1~20, we can carry out a Ljung-Box test using R’s Box.test() function.

Here the Ljung-Box test statistic is 17.4, and the p-value is 0.6, so there is little evidence of non-zero autocorrelations in the in-sample forecasts errors at lags 1~20.

To be sure that the predictive model cannot be improved upon, it is also a good idea to check whether the forecast errors are normally distributed with mean zero and constant variance. We plot the in-sample forecast errors

Plot of in-sample forecast errors

The plot shows that the in-sample forecast errors seem to have roughly constant variance over time, althought the fluctuations in the start may be slightly than that at later dates.

We can also plot a histogram to check whether the forecast errors are normally distributed with mean zero. Here we first create a R function and save it to file ‘Plot_Forecast_Errors_Histogram.R’

Then we call this function and plot the histogram

Histogram of forecast errors

The histogram shows that the distribution of forecast errors is roughly centered on zero, and is more or leass normally distributed, although it seems to slightly skewed to the right compared to a normal curve. However, the right skew is relatively small, and so it is plausible that the forecast errors are normally distributed with mean zero.

In summary: The Ljung-Box test showed that there is little evidence of non-zero autocorrelations in the in-sample forecast errors, and the distribution of forecast errors seems to be normally distributed with mean zero. This suggests that the simple exponential smoothing method provides an adequate predictive model for the rainfall, which probably cannot be improved upon. Furthermore, the assumption that the 80% abd 95% prediction intervals were based on are probably valid.

 

3  Holt’s Exponential Smoothing

If you have a time series that can be described using an additive model with increasing or decreasing trend and no seasonality, you can use Holt’s exponential smoothing to make short-term forecasts.

Holt’s exponential smoothing is controlled by two parameters, \(\alpha\in[0,1]\,\) for estimating the level at the current time point, and \(\beta\in[0,1]\,\) for estimating the slope of the trend component at the current time point. Parameters close to zero means little weight is placed on the most recent observations when making forecasts of future values.

A time series with trend but no seasonalityWe can see obvious trending in the curve. To use the HoltWInters() function for Holt’s exponential smoothing, we need to set the parameter \(\gamma\)=FALSE.

Holt's exponential smoothing for a time series of skirts

Note that the estimated values of \(\alpha=0.84\,\) and \(\beta=1\,\) are both high, meaning that both the estimate of the current value of the level, and of the slope of the trend component, are based mostly on very recent observations in the time series.

You can use the ‘l.start’ and ‘b.start’ arguments to specify the initial values of the level and the slope of the trend component. It is common to set these initial values as following:

  • Use the first value in the time series as the initial value of the level;
  • Use the second value minus the first value in the time series as the initial value of the slope of the trend component.

The forecast.HoltWinters() function can be used to forecast for future times not covered by the original time series

Forecast on future time points beyond original time series

 

ACF of Holt's exponential smoothing forecast residuals

The correlogram shows that the sample autocorrelation for the in-sample forecast errors at lag 5 exceeds the significance bounds. However, we would expect one in 20 of the autocorrelations for the first 20 lags to exceed the 95% significance bounds by chance alone. The p-value from the Ljung-Box test is 0.47, indicating that there is little evidence of non-zero autocorrelation in the in-sample forecast errors at lags 1~20.

Plot residuals and histogram of residuals

As can be seen, the forecast errors have roughly constant variance over time. The histogram shows that it is plausible that the forecast errors are normally distributed with mean zero and constant variance.

In summary: The ljung-Box test shows that there is little evidence of autocorrelations in the forecast errors, while the time plot and histogram of forecast errors show that it is plausible that the forecast errors are normally distributed with zero mean and constant variance. Therefore, we can comclude that Holt’s exponential smoothing provides an adequate predictive model which probably cannot be improved upon. In addition, it means that the assumptions that the 80% and 95% prediction intervals were based on are probably valid.

 

4  Holt-Winters Exponential Smoothing

If you have a time series rhat can be described using an additive model with increasing or decreasing and seasonality, then you can use Holt-Winters exponential smoothing to make sjort-term forecast.

Holt-Winters exponential smoothing estimates the level, slope, and seasonal component at the current time point. Smoothing is controlled by \(\alpha\in[0,1],\,\beta\in[0,1],\,\text{and}\,\gamma\in[0,1]\) for the estimates of the level, slope of the trend, and seasonal component, respectively. Parameter value close to zero means that relatively little weight is places on the most recent observations when making forecasts of future values.

Plot of a time series with trend and seasonality

  • The \(\alpha=0.41\,\)  is relatively low, indicating that the estimate of the level at the current time point is based upon both recent observations and some observations in the more distant past.
  • The \(\beta=0\,\) indicates that the estimate of the slope of the trend component is not updated over the time series. Instead is set equal to its initial value. This makes good intuitive sense, as the level changes quite a bit over the time series, but the slope of the trend component remains roughly the same.
  • The \(\gamma=0.96\,\) is high, indicating that the estimate of the seasonal component at the current time point is just based on very recent observations.

Holt-Winters exponential smootihing  forecast

We can forecast for future time points not included in the original time series

Holt-Winters forecasts beyond original time series

We investigate whether the predictive model can be improved by checking the autocorrelation in and do Ljung-Box test on the forecast errors:

ACF of Holt-WInters exponential smoothing forecast

As can be seen, the autocorrelations for the in-sample forecasterrors do not exceed the significance bounds for lags 1~20. Furthurmore, the p-value for Ljung-Box test is 0.6, indicating that there is little evidence of non-zero autocorrelations at lags 1~20.

We also check whether the forecast errors have constant variance and are normally distributed with mean zero

 

Check the distribution of the forecast residuals

As can be seen, it is plausible that the forecast errors have constant variance over time, and they are normally distributed with mean zero.

In summary: There is little evidence of autocorrelation at lags 1~20 for the forecast errors, and the forecast errors appear to be normally distributed with mean zero and constant variance over time. This suggest that Holt-Winters exponential smoothing provides an adequate predictive model which probably cannot be improved uopn. FUrthermore, the assumptions upon which the prediction intervals were based are probably valid.

Another example:

Holt-Winters exponential smoothing on a time series of wine sale

Here \(\alpha=0.41,\,\beta=0,\,\gamma=0.47\), meaning that the level and seasonal variation adapt rapidly whereas the trend is slow to do so.

The coefficients are the estimated values of the level, slope, and the 12 multiplicative seasonals from January to December available at the latest time point.

To see the predictive performance of the model, we can compare the mean square one-step-ahead prediction error with the standard deviation of the original time series.

As can be seen, we see a substantial decrease from 50.5 to 121.39.

By plotting the fotted component of the HW smoothing result, we can get the decomposed plots

Decomposed plot of Holt-Winters exponential smoothing result

FOrecast of wine sales for future time points beyond original time series

Last example:

Holt-Winters exponential smoothing on the AirPassenger time series

Decomposition of the Air Passenger time series dataHolt-Winters forecast of air passengers for futures

 

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *