Accumulated Local Effect (ALE)

Machine Learning Model Explanability
Jasper Lok https://jasperlok.netlify.app/
03-21-2023

In this post, I will be exploring another model explainability method, i.e. accumulated local effect.

Photo by Thomas Willmott on Unsplash

Accumulated local effect

The general idea of why ACE is preferred over partial dependence plot is partial dependence plot cannot be trusted when the features of the machine learning model are correlated (Molnar 2022).

The author further explained the correlation could greatly bias the estimated feature effect.

In short, what ALE is trying to do:

Voilà! That is how we obtained the ALE curve.

Pros and cons

Below are the pros and cons of using ALE curve:

Pros

Cons

Bonus

I happened to come across this comparison table on different model explainability methods.

Taken from this website

I feel the summary is quite well done and easy to understand.

Demonstration

In this demonstration, I will be using the accumulated_dependence function from ingredients package to explain the model.

Setup the environment

First, I will call the relevant packages to set up the environment.

pacman::p_load(tidyverse, DALEX, DALEXtra, tidymodels, ingredients, themis, iml, ranger)

Import the data

I will re-use one of the Kaggle datasets I previously used for model explainability.

df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-03-12-marketbasket/data/general_data.csv") %>%
  # drop the columns we don't need
  dplyr::select(-c(EmployeeCount, StandardHours, EmployeeID)) %>%
  # impute the missing values with the mean values
  mutate(
    NumCompaniesWorked = case_when(
      is.na(NumCompaniesWorked) ~ mean(NumCompaniesWorked, na.rm = TRUE),
      TRUE ~ NumCompaniesWorked),
    TotalWorkingYears = case_when(
      is.na(TotalWorkingYears) ~ mean(TotalWorkingYears, na.rm = TRUE),
      TRUE ~ TotalWorkingYears)
    ) %>%
  droplevels()

Build a model

For simplicity, I will reuse the random forest model building code I wrote in my previous post so that we can focus this post on how we apply PDP to interpret the machine learning model results.

You can refer to my previous post on the explanations of the model building.

The only difference I made in this model building is instead of imputing the missing values during recipe stage, I imputed the missing values before building the model.

This is because the accumulated_dependence function I will be using later is unable to handle missing values.

ranger_recipe <- 
  recipe(formula = Attrition ~ ., 
         data = df) %>%
  step_nzv(all_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_upsample(Attrition) %>%
  prep()

ranger_spec <- 
  rand_forest(trees = 1000) %>% 
  set_mode("classification") %>% 
  set_engine("ranger") 

ranger_workflow <- 
  workflow() %>% 
  add_recipe(ranger_recipe) %>% 
  add_model(ranger_spec) 

ranger_fit <- ranger_workflow %>%
  fit(data = df)

Accumulated Local Effects

Create explainer plot

Similarly to the last post, I will first create the explainer object by using the explain_tidymodels function.

ranger_explainer <- explain_tidymodels(ranger_fit,
                   data = dplyr::select(df, -Attrition),
                   y = df$Attrition,
                   verbose = FALSE)

Use accumulated_dependence function

Next, I will use accumulated_dependence function to derive ALE.

ranger_ale_dept <- 
  accumulated_dependence(ranger_explainer,
                         N = 100,
                         variables = "Department")

Then, I will pass the object into plot function to visualize the results.

plot(ranger_ale_dept)

The ALE will be a curve if the selected variable is in numeric form.

ranger_ale_yearCo <- 
  accumulated_dependency(ranger_explainer,
                         N = 1000,
                         variables = "YearsAtCompany")

plot(ranger_ale_yearCo)

Similarly, we could plot out the ALE curves for all the numeric variables.

ranger_ale_num <- 
  accumulated_dependence(ranger_explainer,
                         N = 1000,
                         variable_type = "numerical")

plot(ranger_ale_num)

To satisfy my curiosity, I will also generate the partial dependence plot for the numeric variables for this model.

This will allow me to compare the results side by side.

ranger_part_num <- 
  partial_dependence(ranger_explainer,
                         N = 1000,
                         variable_type = "numerical")

plot(ranger_part_num)

By comparing the ALE curve and PDP curve, it doesn’t seem like the shape of the curves have changed drastically when we move from ALE to PDP.

But the effect of each variable does seem to be different under ALE and PDP methods.

Use model_profile function

Alternatively, we could use the model_profile function from DALEX package to show the accumulated local effect plot.

I will also pass accumulated to the type argument to derive ALE result before plotting the result.

ranger_company_ale <- model_profile(ranger_explainer, 
                                    variables = "YearsAtCompany", 
                                    type = "accumulated")

plot(ranger_company_ale)

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by John Bakator on Unsplash

Molnar, Christoph. 2022. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 2nd ed. https://christophm.github.io/interpretable-ml-book.

References