Previously, I happened to come across these time series packages and thought of trying them out to see how they work.
Hence, I will jump straight into the demonstration.

Photo by Nathan Dumlao on Unsplash
First, I will set up the environment by calling all the relevant packages.
pacman::p_load(tidyverse, tidymodels, statmod, modeltime, janitor, tsibble, timetk, workflowsets)
In this demonstration, I will be using the Seoul bike-sharing dataset.
The relevant data can be downloaded from UCL website.

Photo by Josh Nuttall on Unsplash
df <-
read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2023-09-23-boruta/data/SeoulBikeData.csv"
,locale = locale(encoding = "latin1")) %>%
clean_names() %>%
mutate(date = dmy(date)) %>%
group_by(date) %>%
summarise(rented_bike_count_tot = sum(rented_bike_count)
,temperature_c_avg = mean(temperature_c)
,humidity_percent_avg = mean(humidity_percent)
,wind_speed_m_s_avg = mean(wind_speed_m_s)
,visibility_10m_avg = mean(visibility_10m)
,dew_point_temperature_c_avg = mean(dew_point_temperature_c)
,solar_radiation_mj_m2_avg = mean(solar_radiation_mj_m2)
,rainfall_mm_avg = mean(rainfall_mm)
,snowfall_cm_avg = mean(snowfall_cm)
,holiday_first = first(holiday))
We could visualize the time series by using plot_time_series function. This function will produce an interactive chart.
df %>%
plot_time_series(date, rented_bike_count_tot)
First, I will split the dataset into training and testing datasets.
Note that as this is a time series dataset, hence we cannot use the usual random splitting. We split by the time instead.
df_splits <-
initial_time_split(df, prop = 0.6)
Next, I will build an ARIMA model by using the functions from modeltime package.
arima_fit <-
arima_reg() %>%
set_engine(engine = "auto_arima") %>%
fit(rented_bike_count_tot ~
date
+ temperature_c_avg
+ temperature_c_avg
+ humidity_percent_avg
+ wind_speed_m_s_avg
+ visibility_10m_avg
+ dew_point_temperature_c_avg
+ solar_radiation_mj_m2_avg
+ rainfall_mm_avg
+ snowfall_cm_avg
+ holiday_first
,data = training(df_splits))
The cool thing about this package is its syntax follows the same pattern as tidymodels package.
Then, I will pass the testing data into the function to evaluate how well this model is performing.
calibration_df <-
arima_fit %>%
modeltime_calibrate(new_data = testing(df_splits))
calibration_df %>%
modeltime_accuracy()
# A tibble: 1 × 9
.model_id .model_desc .type mae mape mase smape rmse rsq
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 REGRESSION WITH… Test 6401. Inf 1.02 35.3 9191. 0.232
Oof, the model doesn’t seem to perform very well.
Instead of fitting models one by one, we could fit several models at one time.
First, I will create the general formula for all the models I will be building later.
gen_formula <-
as.formula("rented_bike_count_tot ~ date + temperature_c_avg + temperature_c_avg + humidity_percent_avg + wind_speed_m_s_avg + visibility_10m_avg + dew_point_temperature_c_avg + solar_radiation_mj_m2_avg + rainfall_mm_avg + snowfall_cm_avg + holiday_first")
Next, I will define the models to be built.
# ARIMA
arima_fit <-
arima_reg() %>%
set_engine(engine = "auto_arima") %>%
fit(gen_formula
,data = training(df_splits))
# prophet
prophet_fit <-
prophet_reg() %>%
set_engine(engine = "prophet") %>%
fit(gen_formula
,data = training(df_splits))
# prophet - boost
prophet_boost_fit <-
boost_tree("regression") %>%
set_engine(engine = "xgboost") %>%
fit(gen_formula
,data = training(df_splits))
# seasonal
seasonal_fit <-
seasonal_reg() %>%
set_engine(engine = "stlm_ets") %>%
fit(gen_formula
,data = training(df_splits))
Then, I will combine the models by using modeltime_table function. The function will provide an ID and description for each model.
model_tbl <-
modeltime_table(
arima_fit
,prophet_fit
,prophet_boost_fit
,seasonal_fit
)
model_tbl
# Modeltime Table
# A tibble: 4 × 3
.model_id .model .model_desc
<int> <list> <chr>
1 1 <fit[+]> REGRESSION WITH ARIMA(1,1,2)(0,0,1)[7] ERRORS
2 2 <fit[+]> PROPHET W/ REGRESSORS
3 3 <fit[+]> XGBOOST
4 4 <fit[+]> SEASONAL DECOMP: ETS(A,N,N)
I will measure the performance of each model.
model_calibrate <-
model_tbl %>%
modeltime_calibrate(new_data = testing(df_splits))
model_calibrate %>%
modeltime_accuracy()
# A tibble: 4 × 9
.model_id .model_desc .type mae mape mase smape rmse rsq
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 REGRESSION … Test 6401. Inf 1.02 35.3 9191. 0.232
2 2 PROPHET W/ … Test 12064. Inf 1.92 52.1 14101. 0.235
3 3 XGBOOST Test 6595. Inf 1.05 40.7 8910. 0.245
4 4 SEASONAL DE… Test 7134. Inf 1.13 38.5 10149. 0.00214
We can even pass the model performance to table_modeltime_accuracy function to show them in a table format.
model_calibrate %>%
modeltime_accuracy() %>%
table_modeltime_accuracy(
.interactive = TRUE
)
Next, I will extract the predictions from each model and compare them to the actual data.
model_calibrate %>%
modeltime_forecast(
new_data = testing(df_splits),
actual_data = df
)
# A tibble: 949 × 7
.model_id .model_desc .key .index .value .conf_lo .conf_hi
<int> <chr> <fct> <date> <dbl> <dbl> <dbl>
1 NA ACTUAL actual 2017-12-01 9539 NA NA
2 NA ACTUAL actual 2017-12-02 8523 NA NA
3 NA ACTUAL actual 2017-12-03 7222 NA NA
4 NA ACTUAL actual 2017-12-04 8729 NA NA
5 NA ACTUAL actual 2017-12-05 8307 NA NA
6 NA ACTUAL actual 2017-12-06 6669 NA NA
7 NA ACTUAL actual 2017-12-07 8549 NA NA
8 NA ACTUAL actual 2017-12-08 8032 NA NA
9 NA ACTUAL actual 2017-12-09 7233 NA NA
10 NA ACTUAL actual 2017-12-10 3453 NA NA
# ℹ 939 more rows
I will visualize the results as well.
model_calibrate %>%
modeltime_forecast(
new_data = testing(df_splits),
actual_data = df
) %>%
plot_modeltime_forecast()
As shown in the chart, we can see that the seasonal decomposition model performs worse than other models.
workflowsets packageFirst, I will define the recipe to be used in fitting the model.
gen_recipe <-
recipe(rented_bike_count_tot ~ date + temperature_c_avg + temperature_c_avg + humidity_percent_avg + wind_speed_m_s_avg + visibility_10m_avg + dew_point_temperature_c_avg + solar_radiation_mj_m2_avg + rainfall_mm_avg + snowfall_cm_avg + holiday_first
,data = training(df_splits))
Then, I will define the models to be fitted.
wf_arima <-
arima_reg() %>%
set_engine(engine = "auto_arima")
wf_prophet <-
prophet_reg() %>%
set_engine(engine = "prophet")
I will pass all this info into the workflow_set function.
Lastly, I will use modeltime_fit_workflowset function to fit all the models within the workflow.
wfsets_fit <-
wfsets %>%
modeltime_fit_workflowset(training(df_splits))
wfsets_fit
# Modeltime Table
# A tibble: 2 × 3
.model_id .model .model_desc
<int> <list> <chr>
1 1 <workflow> GEN_RECIPE_ARIMA_FIT
2 2 <workflow> GEN_RECIPE_PROPHET_FIT
The steps to evaluate the fitted models are the same as how we did it in the previous sub-section.
wfsets_calibrate <-
wfsets_fit %>%
modeltime_calibrate(new_data = testing(df_splits))
wfsets_calibrate %>%
modeltime_accuracy()
# A tibble: 2 × 9
.model_id .model_desc .type mae mape mase smape rmse rsq
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 GEN_RECIPE_AR… Test 6401. Inf 1.02 35.3 9191. 0.232
2 2 GEN_RECIPE_PR… Test 12064. Inf 1.92 52.1 14101. 0.235
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Murray Campbell on Unsplash