Partial Dependence Plot (PDP)

Machine Learning Model Explanability

If all else being equal, what is the effect of the selected variable?

Jasper Lok https://jasperlok.netlify.app/
05-07-2022

In this post, I will be exploring another popular model explainability method, i.e. partial dependence.

Photo by Oleg Magni

Partial dependence plots (PDP) show the dependence between the target response and a set of input features of interest, marginalizing over the values of all other input features (the ‘complement’ features). Intuitively, we can interpret the partial dependence as the expected target response as a function of the input features of interest (Scikit-learn).

Before we jump into the partial dependence plot, let’s take a look what is a global model-agnostic method.

What is a global model-agnostic method?

In my previous post, I have discussed how lime method can explain the model prediction, where lime is a local model-agnostic method.

The local model-agnostic method explains individual predictions, i.e. how the different variables affect the individual predicted outcome.

Instead of focusing on individual predictions, the global model-agnostic method focuses on explaining the overall model predictions.

Unlike the LIME method, partial dependence is a global XAI method. The global method gives a comprehensive explanation of the entire data set, describing the impact of the features(s) on the target variable in the context of the overall data (Kim 2021).

Partial Dependence

How does partial dependence work?

Below is how partial dependence works (Baeder, Brinkmann, and Xu 2021):

Pros and cons

Some of the advantages of partial dependence are (Molnar 2022):

Unfortunately, partial dependence also comes with limitations. Below are some of the limitations discussed in (Molnar 2022):

In the demonstration below, I will show some of the limitations of partial dependence.

Demonstration

In this demonstration, I will be using the employee attrition dataset from Kaggle.

Nevertheless, let’s begin the demonstration!

Setup the environment

First, I will set up the environment by calling all the packages I need for the analysis later.

packages <- c('tidyverse', 'readr', 'tidymodels', 'DALEXtra', 'themis', 
              'ingredients', 'corrplot')

for(p in packages){
  if(!require (p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

For this demonstration, we will be using an R package called ingredients. This package allows producing partial dependence plots with a few lines of code.

Also, this package is part of the model explainability tools developed by MI^2 DataLab. This allows us to use other model explainability tools without many changes to the codes.

Import the data

First I will import the data into the environment.

df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-03-12-marketbasket/data/general_data.csv") %>%
  select(-c(EmployeeCount, StandardHours, EmployeeID))

I will set the random seed for reproducability.

set.seed(1234)

Build a model

For simplicity, I will reuse the random forest model building code I wrote in my previous post so that we can focus this post on how we apply PDP to interpret the machine learning model results.

You can refer to my previous post on the explanations of the model building.

df_split <- initial_split(df, 
                          prop = 0.6, 
                          strata = Attrition)

df_train <- training(df_split)
df_test <- testing(df_split)


ranger_recipe <- 
  recipe(formula = Attrition ~ ., 
         data = df_train) %>%
  step_impute_mean(NumCompaniesWorked,
                   TotalWorkingYears) %>%
  step_nzv(all_predictors()) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_upsample(Attrition)

ranger_spec <- 
  rand_forest(trees = 1000) %>% 
  set_mode("classification") %>% 
  set_engine("ranger") 

ranger_workflow <- 
  workflow() %>% 
  add_recipe(ranger_recipe) %>% 
  add_model(ranger_spec) 

ranger_fit <- ranger_workflow %>%
  fit(data = df_train)

Partial Dependence Plot

Now, we will start using partial dependence to explain our model predictions!

Create explainer objects

Similarly to last post, I will first create the explainer object by using explain_tidymodels function.

ranger_explainer <- explain_tidymodels(ranger_fit,
                   data = select(df_train, -Attrition),
                   y = df_train$Attrition,
                   verbose = FALSE)

Aside from that, we need the following codes to ensure the right explainers are being used (Lendway).

model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer
predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer

Otherwise, the subsequent codes will not be able to run.

Explaining how the different variables affect the predictions

Next, we will start explaining how the different variables affect the outcome.

To do so, we just need to indicate the interested variable in the partial_dependence function as shown below.

pdp_ranger <- partial_dependence(ranger_explainer,
                                 variables = c("YearsAtCompany"))
graph <- plot(pdp_ranger)
graph

As shown in the graph above, below are we can observe from the graph above:

As the graph object is a ggplot object, that allows us to modify the graph object by using ggplot functions.

class(graph)
[1] "gg"     "ggplot"

Below I have made modifications to the graph by using ggplot functions:

graph + 
  guides(color = "none") +
  theme(plot.tag = element_blank()) +
  labs(title = "Partial Dependence Profile", 
       subtitle = NULL) +
  theme_light()

Alternatively, we could indicate the list of partial dependence plots to be produced by indicating the variable types under variable_type argument.

pdp_ranger_num <- partial_dependence(ranger_explainer,
                                 variable_type = "numerical")


plot(pdp_ranger_num)

From the graphs above, we could observe the following:

As mentioned in the earlier section, one of the assumptions of partial dependence is the variables are not correlated with one another.

To check this assumption, I will plot the correlation matrix of the numeric variables by using the corrplot function.

df_num <- df %>%
  select_if(is.numeric)

corrplot(cor(df_num, use="pairwise.complete.obs"), 
         method = "number", 
         type = "upper", 
         tl.cex = 0.65, 
         number.cex = 0.65, 
         diag = FALSE)

From the correlation chart above, it is clear that the variables are not independent of one another, which in practice is quite unlikely the predictors are independent of one another.

(Molnar 2022) discussed the issue of using PDP plot when the variables are not independent of one another. When the variables are correlated, we create data points in areas of the feature distribution where the actual probability is very low.

For example, from the correlation matrix above, Age and TotalWorkingYears are positively correlated. This makes sense as in general, we could expect older employees would have more working experience.

But in the PDP calculation, as we permute the data points over different combinations, we could have data points that might not make sense. For example, the algorithm could generate a profile with age = 20 and total working years > 20.

As the partial dependence algorithm is unable to differentiate these data points from the rest, these unlikely data points will be used in the average feature effect curve in the partial dependence plot as well.

In the future post, I will be exploring other methods that help us to overcome this.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Jenna Hamra

Baeder, Larry, Peggy Brinkmann, and Eric Xu. 2021. Interpretable Machine Learning for Insurance. https://www.soa.org/resources/research-reports/2021/interpretable-machine-learning/.
Kim, Seungjun (Josh). 2021. “Explainable AI (XAI) Methods Part 1 — Partial Dependence Plot (PDP).” https://towardsdatascience.com/explainable-ai-xai-methods-part-1-partial-dependence-plot-pdp-349441901a3d.
Lendway, Lisa. “Interpretable Machine Learning: This Tutorial Focuses on Local Interpretation.” https://advanced-ds-in-r.netlify.app/posts/2021-03-31-imllocal/.
Molnar, Christopher. 2022. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/pdp.html.
Scikit-learn. “4.1. Partial Dependence and Individual Conditional Expectation Plots.” https://scikit-learn.org/stable/modules/partial_dependence.html#mathematical-definition.

References