If all else being equal, what is the effect of the selected variable?
In this post, I will be exploring another popular model explainability method, i.e. partial dependence.

Photo by Oleg Magni
Partial dependence plots (PDP) show the dependence between the target response and a set of input features of interest, marginalizing over the values of all other input features (the ‘complement’ features). Intuitively, we can interpret the partial dependence as the expected target response as a function of the input features of interest (Scikit-learn).
Before we jump into the partial dependence plot, let’s take a look what is a global model-agnostic method.
In my previous post, I have discussed how lime method
can explain the model prediction, where lime is a local
model-agnostic method.
The local model-agnostic method explains individual predictions, i.e. how the different variables affect the individual predicted outcome.
Instead of focusing on individual predictions, the global model-agnostic method focuses on explaining the overall model predictions.
Unlike the LIME method, partial dependence is a global XAI method. The global method gives a comprehensive explanation of the entire data set, describing the impact of the features(s) on the target variable in the context of the overall data (Kim 2021).
Below is how partial dependence works (Baeder, Brinkmann, and Xu 2021):
For each level i, of the selected feature (continuous variables are binned):
For all observations, modify the value of the selected feature to i
Using the modified observations and the existing model, predict the response variable value for every observation
Calculate the average predicted values for all observations
Plot the average predicted values for each level (y-axis) against the feature levels (x-axis)
Some of the advantages of partial dependence are (Molnar 2022):
Computation of partial dependence plots is intuitive
Easy to interpret the graph
Easy to implement
Unfortunately, partial dependence also comes with limitations. Below are some of the limitations discussed in (Molnar 2022):
The author argued that omitting the distribution can be misleading
This method assumes the variables are not correlated with one another, which is unlikely to be true
Heterogeneous effects might be hidden in the PD plot as the plot is only showing marginal effects
In the demonstration below, I will show some of the limitations of partial dependence.
In this demonstration, I will be using the employee attrition dataset from Kaggle.
Nevertheless, let’s begin the demonstration!
First, I will set up the environment by calling all the packages I need for the analysis later.
packages <- c('tidyverse', 'readr', 'tidymodels', 'DALEXtra', 'themis',
'ingredients', 'corrplot')
for(p in packages){
if(!require (p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
For this demonstration, we will be using an R package called
ingredients. This package allows producing partial
dependence plots with a few lines of code.
Also, this package is part of the model explainability tools developed by MI^2 DataLab. This allows us to use other model explainability tools without many changes to the codes.
First I will import the data into the environment.
df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-03-12-marketbasket/data/general_data.csv") %>%
select(-c(EmployeeCount, StandardHours, EmployeeID))
I will set the random seed for reproducability.
set.seed(1234)
For simplicity, I will reuse the random forest model building code I wrote in my previous post so that we can focus this post on how we apply PDP to interpret the machine learning model results.
You can refer to my previous post on the explanations of the model building.
df_split <- initial_split(df,
prop = 0.6,
strata = Attrition)
df_train <- training(df_split)
df_test <- testing(df_split)
ranger_recipe <-
recipe(formula = Attrition ~ .,
data = df_train) %>%
step_impute_mean(NumCompaniesWorked,
TotalWorkingYears) %>%
step_nzv(all_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_upsample(Attrition)
ranger_spec <-
rand_forest(trees = 1000) %>%
set_mode("classification") %>%
set_engine("ranger")
ranger_workflow <-
workflow() %>%
add_recipe(ranger_recipe) %>%
add_model(ranger_spec)
ranger_fit <- ranger_workflow %>%
fit(data = df_train)
Now, we will start using partial dependence to explain our model predictions!
Similarly to last post, I will first create the explainer object by
using explain_tidymodels function.
ranger_explainer <- explain_tidymodels(ranger_fit,
data = select(df_train, -Attrition),
y = df_train$Attrition,
verbose = FALSE)
Aside from that, we need the following codes to ensure the right explainers are being used (Lendway).
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer
predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer
Otherwise, the subsequent codes will not be able to run.
Next, we will start explaining how the different variables affect the outcome.
To do so, we just need to indicate the interested variable in the
partial_dependence function as shown below.
pdp_ranger <- partial_dependence(ranger_explainer,
variables = c("YearsAtCompany"))
graph <- plot(pdp_ranger)
graph

As shown in the graph above, below are we can observe from the graph above:
Employees who just recently just joined the company are more likely to resign
The likelihood of resignation drop significantly when the years of service increase and stay flat around year 8
The likelihood will then increase again at around 20 years
As the graph object is a ggplot object, that allows us to modify the
graph object by using ggplot functions.
class(graph)
[1] "gg" "ggplot"
Below I have made modifications to the graph by using
ggplot functions:
graph +
guides(color = "none") +
theme(plot.tag = element_blank()) +
labs(title = "Partial Dependence Profile",
subtitle = NULL) +
theme_light()

Alternatively, we could indicate the list of partial dependence plots
to be produced by indicating the variable types under
variable_type argument.
pdp_ranger_num <- partial_dependence(ranger_explainer,
variable_type = "numerical")
plot(pdp_ranger_num)

From the graphs above, we could observe the following:
It seems like the predictive powers for some variables are low since their partial dependence plots are rather flat
There is a spike in the likelihood of resignation for employees who have recently had a much higher percentage of salary hike, which worthwhile to further investigate the reasons
It also seems like employees who previously worked for more companies in the past are more likely to resign
As mentioned in the earlier section, one of the assumptions of partial dependence is the variables are not correlated with one another.
To check this assumption, I will plot the correlation matrix of the
numeric variables by using the corrplot function.
df_num <- df %>%
select_if(is.numeric)
corrplot(cor(df_num, use="pairwise.complete.obs"),
method = "number",
type = "upper",
tl.cex = 0.65,
number.cex = 0.65,
diag = FALSE)

From the correlation chart above, it is clear that the variables are not independent of one another, which in practice is quite unlikely the predictors are independent of one another.
(Molnar 2022) discussed the issue of using PDP plot when the variables are not independent of one another. When the variables are correlated, we create data points in areas of the feature distribution where the actual probability is very low.
For example, from the correlation matrix above, Age and TotalWorkingYears are positively correlated. This makes sense as in general, we could expect older employees would have more working experience.
But in the PDP calculation, as we permute the data points over different combinations, we could have data points that might not make sense. For example, the algorithm could generate a profile with age = 20 and total working years > 20.
As the partial dependence algorithm is unable to differentiate these data points from the rest, these unlikely data points will be used in the average feature effect curve in the partial dependence plot as well.
In the future post, I will be exploring other methods that help us to overcome this.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Jenna Hamra