Previously I have attended a time series workshop conducted by my professor.

It has got my interest to re-study on my notes on time series.

It was one of the topic I have learnt when I was learning the different analytics techniques pertaining to operation analytics.

Hence, in this post, I will be summarize what I have learnt about exponential smoothing and more importantly how to implement this method in R.

Photo by cottonbro

What is exponential smoothing?

(Hyndman 2021a) explained that exponential smoothing produces forecasts that are weighted averages of past observations, with the weights decaying exponentially as the observations get older.

For example, single exponential smoothing has following formula:

\[\hat{y}_{t+1} = \alpha y_{t} + (1 - \alpha)\hat{y}_{t}\]

where

\(\hat{y}_{t}\) refers to the forecasted value at time t

\(y_{t}\) refers to the actual value at time t

\(\alpha\) is the weight

From the formula, we can note that the forecasted value at time t+1 is a weighted average between actual value and forecasted value at time t.

Now you might be asking why this method is called ‘exponential’?

If we go back to the formula above, we could replace the forecasted value with the same formula as follows:

\[\hat{y}_{t+1} = \alpha y_{t} + \alpha (1 - \alpha){y}_{t-1} + \alpha (1 - \alpha)^{2} y_{t-2}+...\]

As shown above, the current forecasted value is sum of past actual values, where the weights decreases exponentially when we go back in time.

Nevertheless, the author also mentioned that this method generates reliable forecasts quickly and for a wide range of time series, which is a great advantage and of major importance to applications in industry.

In general, below are the three different types of exponential smoothing models:

Simple exponential smoothing
Double exponential smoothing
Triple exponential smoothing

Smoothing Method	Description	Remarks
Simple exponential smoothing	Suitable when there is no clear trend or seasonal pattern within the data
Double exponential smoothing	Include trend	A.k.a. Holt-winters’ two parameter or Holt’s linear trend method
Triple exponential smoothing	Include both trend and seasonal factor	A.k.a. Holt-winters’ three parameter

Not to bore the readers the formula under these models, I shall exclude them from the post. Otherwise this post will be full of formula.

Taken from giphy

Trend

For the trend components, it can be either damped version or non-damped version.

Damped as in the trend will be flatted in the future.

Seasonal factor

For the seasonal factor, it can be either additive or multiplicative.

Note that following is the definitions of seasonal and cyclical:

Types	Description
Seasonal	When the components repeat the pattern with a fixed and known duration
Cyclical	When the components seem to have some regular and repeated fluctuation around the trend but we don’t know the duration of the fluctuations

Additive Components

This would be more appropriate when the magnitude of the seasonal fluctuations, or the variations around the trend-cycle, does not vary with the level of the time series (Hyndman 2021b).

The additive components can be written as follows:

\[y_t = S_t + T_t + R_t\]

Where

\(y_t\) is the data

\(S_t\) is the seasonal component

\(T_t\) is the trend-cycle component

\(R_t\) is the remainder component

Multiplicative Components

When the variation in the seasonal pattern, or the variation around the trend-cycle, appears to be proportional to the level of time series, then a multiplicative decomposition is more appropriate.

The multiplicative components can be written as follows:

\[y_t = S_t \times T_t \times R_t\]

Where

\(y_t\) is the data

\(S_t\) is the seasonal component

\(T_t\) is the trend-cycle component

\(R_t\) is the remainder component

That’s all for the discussion on exponential smoothing.

Following are the summary of the discussion above:

Taken from Chapter 8.4 of Forecasting: Principles and Practice

Demonstration

In this demonstration, I will use the comsumer price index dataset from SingStat.

Photo by Coleen Rivas on Unsplash

Setup the environment

First, I will load the necessary packages into the environment.

pacman::p_load(tidyverse, lubridate, timetk, tsibble, janitor, 
               feasts, fable, rsample)

Import Data

Next, I will import in the dataset and perform some pre-processing to clean up the data.

Refer to my previous blog post for the pre-processing discussion.

df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-10-23-timesseries/data/M212881.csv", 
               skip = 10) %>%
  slice(1:148) %>%
  filter(`Data Series` != "Hawker Centres" &
           `Data Series` != "Food Courts & Coffee Shops")

# convert columns into numeric
cols_num <- c(2:ncol(df))
df[cols_num] <- sapply(df[cols_num], as.numeric)

# pivot longer the columns
df_1 <- df %>%
  pivot_longer(!`Data Series`, 
               names_to = "Month Year", 
               values_to = "CPI") %>%
  clean_names() %>%
  mutate(month_year_recoded = ym(month_year)) %>%
  select(-month_year) %>%
  filter(month_year_recoded >= "2014-01-01" & month_year_recoded <= "2020-12-01") %>%
  filter(data_series == "All Items")

# convert into tsibble data frame
df_1_ts <- df_1 %>%
  mutate(month_year_recoded = yearmonth(month_year_recoded)) %>%
  as_tsibble(index = month_year_recoded,
             key = data_series)

Split into Training and Testing Dataset

rsample package provides users the function (i.e. initial_time_split) to split the time series data into training and testing dataset.

ts_split <- df_1_ts %>%
  initial_time_split(prop = 0.8)

Similar to the usual train test split for machine learning, I will use training and testing function to split the dataset.

ts_train <- training(ts_split)
ts_test <- testing(ts_split)

Model Building

Now that is done, I will pass the training dataset into the model function. I will also specify I would like to build an ETS model by using ETS function.

ts_ets <- ts_train %>%
  select(-data_series) %>%
  model(ETS(cpi))

Note that I have not specified the following in the ETS function:

Whether there is a trend or seasonality
Should the trend or seasonality be additive or multiplicative
Should the trend or season be damped

ts_ets

# A mable: 1 x 1
    `ETS(cpi)`
       <model>
1 <ETS(A,N,A)>

On surface, it seems like there is no info for the fitted model. The info of fitted model is actually within the object.

So, we can “unpack” the info by using augment function.

augment(ts_ets)

# A tsibble: 67 x 6 [1M]
# Key:       .model [1]
   .model   month_year_recoded   cpi .fitted    .resid    .innov
   <chr>                 <mth> <dbl>   <dbl>     <dbl>     <dbl>
 1 ETS(cpi)           2014 Jan  99.5    99.4  0.144     0.144   
 2 ETS(cpi)           2014 Feb  99.5    99.6 -0.138    -0.138   
 3 ETS(cpi)           2014 Mar  99.7    99.6  0.156     0.156   
 4 ETS(cpi)           2014 Apr  99.3    99.3 -0.000674 -0.000674
 5 ETS(cpi)           2014 May  99.7    99.6  0.0641    0.0641  
 6 ETS(cpi)           2014 Jun  99.5    99.7 -0.195    -0.195   
 7 ETS(cpi)           2014 Jul  99.2    99.3 -0.0478   -0.0478  
 8 ETS(cpi)           2014 Aug  99.7    99.6  0.160     0.160   
 9 ETS(cpi)           2014 Sep  99.6    99.6 -0.0699   -0.0699  
10 ETS(cpi)           2014 Oct  99.3    99.3  0.00315   0.00315 
# ... with 57 more rows

As shown, we can see that fitted model contains the original values, the fitted values and residuals.

Alternatively, we could pass the fitted model into the report function to summarize the model info (eg. model parameters, model performance etc).

ts_ets %>% 
  report()

Series: cpi 
Model: ETS(A,N,A) 
  Smoothing parameters:
    alpha = 0.6682514 
    gamma = 0.0001000068 

  Initial states:
     l[0]      s[0]      s[-1]      s[-2]      s[-3]     s[-4]
 99.44194 0.0604655 0.02728087 -0.2385938 0.08574436 0.1519245
      s[-5]     s[-6]      s[-7]     s[-8]     s[-9]     s[-10]
 -0.2016484 0.1103206 0.06183502 -0.207833 0.1466583 0.08352913
      s[-11]
 -0.07968302

  sigma^2:  0.0479

      AIC      AICc       BIC 
 92.48791 101.89967 125.55830

According to the documentation page, following are the meaning of the different smoothing parameters:

alpha - value of smoothing parameter for the level
- Larger alpha value means the model will give more weights to the more recent observations
beta - value of smoothing parameter for the slope
- Larger beta value means the model will give more weights to the more recent trend
phi - value of dampening parameter for the slope
gamma - value of the smoothing parameter for the seasonal pattern
- Larger gamma value means the model will give more weights to the more recent seasonality

To help us to visually inspect the fitted model, we could use gg_tsresiduals function.

The graph would contain the following:

Innovation residuals
ACF chart
Distribution of the residuals

gg_tsresiduals(ts_ets)

Note that innovation residuals is the residuals on the transformed scale if there is transformation used in the model (Hyndman 2021c).

I can only say the name “innovation residual” is so fancy.

Taken from giphy

Nevertheless, the graphs from gg_tsresiduals function show us the following:

The innovation residuals hover around 0, except for a couple of data points
The residual distribution chart also shows a left skewed, resulted by outliers
- By the excluding the outliers, the residual distribution would look closer to a normal distribution with mean = 0
None of the bars in ACF graph exceeds the blue lines, suggesting that the autocorrelations are not statistically different from zero.

Earlier I have used auto method to find the combination of error, trend and season that gives the best model since I have not defined the error, trend and season in the function.

If we want to self define the model, we could pass in additional arguments into ETS function as following:

ts_train %>%
  select(-data_series) %>%
  model(ETS(cpi ~ error("A") + trend("Ad") + season("M"))) %>%
  report()

Series: cpi 
Model: ETS(A,Ad,M) 
  Smoothing parameters:
    alpha = 0.5446765 
    beta  = 0.04208475 
    gamma = 0.0001036066 
    phi   = 0.9771571 

  Initial states:
     l[0]        b[0]     s[0]    s[-1]     s[-2]    s[-3]    s[-4]
 99.78493 -0.04522087 1.000853 1.000824 0.9981717 1.001364 1.001747
     s[-5]    s[-6]    s[-7]     s[-8]    s[-9]   s[-10]    s[-11]
 0.9977731 1.000635 1.000436 0.9977261 1.001104 1.000435 0.9989317

  sigma^2:  0.0492

      AIC      AICc       BIC 
 96.31516 110.56516 135.99962

Refer to the documentation page for the forms allowed under error, trend and season arguments.

Forecast

To forecast the future values, I will use forecast function.

I will also indicate the forecast time period should be 17 months so that I can compare with the testing dataset.

ts_ets_forecast <- ts_ets %>%
  forecast(h = "17 months")

After that, I will plot the actual CPI, fitted CPI and forecast CPI.

ts_train %>%
  ggplot(aes(x = month_year_recoded, 
             y = cpi,
             color = "Actual - Training")) +
  autolayer(ts_ets_forecast, alpha = 0.3) +
  geom_line(size = 1) +
  geom_line(aes(y = .fitted,
                color = "Fitted"),
            size = 1,
            data = augment(ts_ets)) +
  geom_line(aes(y = .mean,
                color = "Forecast"),
            size = 1,
            data = ts_ets_forecast) +
  geom_line(aes(y = cpi,
                color = "Actual - Testing"),
            size = 1,
            data = ts_test) +
  labs(title = "CPI") +
  theme_minimal() +
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        legend.position = "bottom",
        legend.title = element_blank())

Over here, I have used autolayer function to plot out the forecast values in a ggplot layer. This function will plot out the forecast values in a line form and shade the 80 and 95 confidence intervals as default confidence intervals to show on the graph.

The forecasted values are rather flat, compared to the actual training values. This is consistent with the model info, i.e. the fitted model have beta value of 0.

Also, we can observe that the fitted value tends to lag behind the actual value. The spike or drop in the values tend to happen one period after the actual value.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Akram Huseyn on Unsplash

Hyndman, & Athanasopoulos, R. J. 2021a. Forecasting: Principles and Practice, 3rd Edition. https://otexts.com/fpp3/expsmooth.html.

———. 2021b. Forecasting: Principles and Practice, 3rd Edition. https://otexts.com/fpp3/expsmooth.html.

———. 2021c. Forecasting: Principles and Practice, 3rd Edition. https://otexts.com/fpp3/residuals.html.

Exponential Smoothing