Is the model judgement fair?

Photo by Thirdman
Recently I was watching this YouTube video by CrashCourse on “Algorithmic Bias and Fairness”, it has successfully piqued my curiosity on fairness in machine learning context.
This has led me to read more about fairness.
So, what is fairness?

Photo by Karolina Grabowska
Oh I am not talking about this type of “fairness”…
(FAT/ML) defines fairness as “ensure that algorithmic decisions do not create discriminatory or unjust impacts when comparing across different demographics (e.g. race, sex, etc)”.
In short, fairness looks at how machine learning models have treated different groups. Typically the groups are defined by sensitive attributes, such as sex, race, nationality and so on.
According to MAS Feat principles, there are two aspects within the fairness principles (MAS).
Justifiability
Accuracy and bias
In short, individuals or groups should not be disadvantaged unless the decisions can be justified.
The algorithm and decisions should be regularly reviewed and minimize the unintentional bias.
You may refer to this link for more info.
As simple as there are repercussions when there is unintended bias or fairness issue we make decisions based on the predictions from the machine learning models.
One famous example is the algorithm that approves credit loan limits for Apple credit cards.
There are reports that the algorithm Apple used in approving credit card limits might have fairness issues. This has resulted in complaints and regulators stepping in to investigate this matter (BBC 2019).
It would not be sufficient to just exclude sensitive attributes during the analysis.
The non-sensitive variables might have some correlations with the sensitive attributes.
For example, in some countries, zip codes can act as a proxy for race.
In a paper by MIT D lab, the authors suggested implementing several checks (i.e. the green boxes in the screenshot below) along the project cycle to consider “fairness” in the machine learning project (MIT D-Lab, Comprehensive Initiative on Technology Evaluation, Massachusetts Institute of Technology 2020).

Over here, I will focus on fairness checks when building machine learning models.
To do so, I will be using fairness_check function in
fairness package to assist me in checking the fairness in
the machine learning model built.
Below are the fairness measurements produced by
fairness_check function:
Accuracy equality ratio
Equal opportunity ratio
Predictive equality ratio
Predictive parity ratio
Statistical parity ratio
| Fairness Measurement | Measures | Remarks |
|---|---|---|
| Accuracy equality ratio | Both protected and unprotected groups have equal prediction accuracy | |
| Equal opportunity ratio | The protected and unprotected groups have same false negative rate | A.k.a. ‘false negative error rate balance’ |
| Predictive equality ratio | Both protected and unprotected groups have same false positive rate | A.k.a. ‘false positive error rate balance’ |
| Predictive parity ratio | Both protected and unprotected groups have the same precisions | A.k.a. ‘outcome test’ |
| Statistical parity ratio | Both protected and unprotected groups have the same probabilities to be assigned to positive predicted class | A.k.a. ‘demographic parity, acceptance rate parity and benchmarking’ |
In this demonstration, I will be using Spaceship Titanic dataset from Kaggle.
Nevertheless, let’s begin the demonstration!
First, I will set up the environment by calling all the packages I need for the analysis later.
pacman::p_load(tidyverse, readr, tidymodels, janitor, DALEXtra, fairmodels)
I will be using fairmodels package to calculate the
different fairness metrics discussed earlier.
The beauty of this package is it could “work” directly with
explainer from DALEXtra package and
tidymodels without requiring many transformation.
So, the following are the steps to check the fairness of the models:
First, build the model(s)
Second, create the explainer(s) by using DALEX or
DALEXtra package
Third, pass the explainer(s) into fairness_check
function from fairmodels package
First I will import the data into the environment.
df_org <- read_csv("data/train.csv")
I will set the random seed for reproducibility.
set.seed(1234)
Next, I will start building the models.
As the purpose of this post is to explore how to implement the fairness measurements, hence I won’t be focusing on how to make the models more accurate.
I will perform some basic cleaning before building the different machine learning models.
df <- df_org %>%
clean_names() %>%
mutate(transported = as_factor(transported),
cryo_sleep = as_factor(cryo_sleep),
vip = as_factor(vip)) %>%
drop_na() %>%
mutate(cabin_deck = str_sub(cabin, 1, 1),
cabin_num = str_sub(cabin, 3, 3),
cabin_side = str_sub(cabin, -1, -1)) %>%
select(-c(passenger_id, name, cabin))
Following are the data wrangling and cleaning I have performed above:
Clean the column names by using clean_names from
janitor so that all of the column names are in small
letters
Mutate all the logical columns to factor columns
Drop all the rows with missing values
Split the cabin columns into cabin_deck, cabin_num & cabin_side based on the descriptions stated on the Kaggle website
Drop passenger ID, name and cabin columns
Okay, now the data is ready, let’s start the demonstration!
The first model I will be building is a random forest model.
# model recipe
ranger_recipe <- recipe(formula = transported ~ .,
data = df) %>%
step_dummy(all_nominal_predictors())
# model specification
ranger_spec <-
rand_forest(trees = 1000) %>%
set_mode("classification") %>%
set_engine("ranger")
# model workflow
ranger_workflow <-
workflow() %>%
add_recipe(ranger_recipe) %>%
add_model(ranger_spec)
# fitting the model
ranger_fit <- ranger_workflow %>%
fit(data = df)
Once the model is built, I will proceed and create the explainer object.
Before that, as explainer requires the target variable to be in numeric form, so I will convert the target variable into numeric form.
y_numeric <- df %>%
mutate(transported_numeric = case_when(transported == TRUE ~ 1,
TRUE ~ 0)) %>%
select(transported_numeric)
Once that is done, I will create the explainer object.
ranger_explainer <- explain_tidymodels(ranger_fit,
data = select(df, -transported),
y = y_numeric,
label = "randomForest",
verbose = FALSE)
Next, I will pass the explainer object into the
fairness_check function.
I will first define the protected variable (or the sensitive
attribute). Over here, I will use the home_planet as the
sensitive attribute for this demonstration.
Earth within the home_planet will be taken
as the privileged group.
protected_var <- df$home_planet
privileged_subgrp <- "Earth"
According to the descriptions on the documentation page, the subgroup parity loss will be calculated with regard to the privileged subgroup.
ranger_fair <- fairness_check(ranger_explainer,
protected = protected_var,
privileged = privileged_subgrp,
colorize = TRUE)
Creating fairness classification object
-> Privileged subgroup : character ([32m Ok [39m )
-> Protected variable : factor ([33m changed from character [39m )
-> Cutoff values for explainers : 0.5 ( for all subgroups )
-> Fairness objects : 0 objects
-> Checking explainers : 1 in total ( [32m compatible [39m )
-> Metric calculation : 13/13 metrics calculated for all models
[32m Fairness object created succesfully [39m
Once the fairness_object is created, there will be a
result log.
From the result log, we can see how many explainer objects were
passed into fairness_check function.
The result log will also inform the users whether all the different metric calculations are being computed successfully.
To extract the values of the computed metrics, we can call the
parity_loss_metric_data from the
fairness_object created as shown below.
ranger_fair$parity_loss_metric_data
TPR TNR PPV NPV FNR FPR FDR
1 0.1454439 0.3417678 0.4406022 0.08489166 2.475674 3.914536 4.944065
FOR TS STP ACC F1 NEW_METRIC
1 1.407863 0.5386661 0.3350889 0.2540447 0.2961478 2.621118
According to the this post, if the metric is zero, NaN will be shown for the metrics so that false information will not be shown.
“A picture is worth a thousand words.”
To help us better understand the results, we can pass the
fairness_object into the plot function to
visualize it.
plot(ranger_fair)

According to the documentation page, the red color areas show whether the selected metrics have exceeded the fairness thresholds.
If the bars reach the red area on the left, this implies there is a bias towards the unprivileged subgroups.
On the other hand, if the bars reach the red area on the right, this implies there is a bias towards the privileged groups.
Hmmm, it seems Mars and Europa have a lower predictive equality ratio.
This implies they have a lower false positive rate (i.e. less chance that a data point that doesn’t transport incorrectly classify as “will be transporting”).
To further check on this, we could also pass the
fairness_object into metric_scores function
and extract the calculated false positive rate under each subgroup.
After that, I pass the results into ggplot function as
shown below.
metric_scores(ranger_fair, fairness_metrics = c("FPR"))$metric_scores_data %>%
ggplot(aes(x = subgroup, y = score)) +
geom_col() +
theme_minimal() +
labs(title = "False Positive Rate under Each Subgroup of Home Planet")

In this post, I won’t be exploring how we could potentially fix the observed fairness issues in the dataset.
Also, according to the documentation page, the default acceptable ratio of metrics between unprivileged and privileged subgroups is set at 0.8.
We can change this acceptable ratio value by passing the value into
epsilon argument.
ranger_fair_0.6 <- fairness_check(ranger_explainer,
protected = protected_var,
privileged = privileged_subgrp,
epsilon = 0.6,
colorize = FALSE)
Creating fairness classification object
-> Privileged subgroup : character ([32m Ok [39m )
-> Protected variable : factor ([33m changed from character [39m )
-> Cutoff values for explainers : 0.5 ( for all subgroups )
-> Fairness objects : 0 objects
-> Checking explainers : 1 in total ( [32m compatible [39m )
-> Metric calculation : 13/13 metrics calculated for all models
Fairness object created succesfully
plot(ranger_fair_0.6)

As shown in the graph above, with the updated acceptable ratio, the statistical parity ratio for Europa is within the acceptable range now.
One cool thing about fairness_check function is it
allows the users to include multiple explainers into the function to
compare the results.
Once that is done, I will build the second and third model for the fairness comparison later.
Second model
# model recipe
xgboost_recipe <-
recipe(formula = transported ~ .,
data = df) %>%
step_dummy(all_nominal_predictors())
# model specification
xgboost_spec <-
boost_tree() %>%
set_mode("classification") %>%
set_engine("xgboost")
# model workflow
xgboost_workflow <-
workflow() %>%
add_recipe(xgboost_recipe) %>%
add_model(xgboost_spec)
# fitting the model
xgboost_fit <- xgboost_workflow %>%
fit(data = df)
[12:41:43] WARNING: amalgamation/../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
# create explainer
xgboost_explainer <- explain_tidymodels(xgboost_fit,
data = select(df, -transported),
y = y_numeric,
label = "xgboost",
verbose = FALSE)
Third model
# model recipe
logit_recipe <-
recipe(formula = transported ~ .,
data = df) %>%
step_dummy(all_nominal_predictors())
# model specification
logit_spec <-
logistic_reg(penalty = 0.1) %>%
set_mode("classification") %>%
set_engine("glmnet")
# model workflow
logit_workflow <-
workflow() %>%
add_recipe(logit_recipe) %>%
add_model(logit_spec)
# fitting the model
logit_fit <- logit_workflow %>%
fit(data = df)
# create explainer
logit_explainer <- explain_tidymodels(logit_fit,
data = select(df, -transported),
y = y_numeric,
label = "logistic",
verbose = FALSE)
Once the models are built, I will pass the different
explainers into fairness_check function as
shown below.
all_fair <- fairness_check(ranger_explainer,
xgboost_explainer,
logit_explainer,
protected = protected_var,
privileged = privileged_subgrp,
colorize = FALSE)
Creating fairness classification object
-> Privileged subgroup : character ([32m Ok [39m )
-> Protected variable : factor ([33m changed from character [39m )
-> Cutoff values for explainers : 0.5 ( for all subgroups )
-> Fairness objects : 0 objects
-> Checking explainers : 3 in total ( [32m compatible [39m )
-> Metric calculation : 13/13 metrics calculated for all models
Fairness object created succesfully
As shown in the results, we can see the value under
checking explainers is shown as 3 now.
Then, I will pass the created fairness_object to the
plot function.
plot(all_fair)

The results of the different models will be plotted together so that its easier to compare.
Alternatively, we can plot the results in a radar graph format as shown below.
plot(fairness_radar(all_fair))

Another useful function in fairmodels package is
performance_and_fairness.
This function allows users to compare the different models under the selected performance and metrics.
The best model is located in the top right corner.
Taking accuracy and statistical parity ratio as an example,
plot(performance_and_fairness(all_fair, fairness_metric = "STP"))
Performace metric is NULL, setting deafult ( accuracy )
Creating object with:
Fairness metric: STP
Performance metric: accuracy

Nevertheless, these are just some of the functions supported by
fairmodels package. Do check out their documentation
page for more details on the different functions.
That’s all for the day!
Thanks for reading the post until the end.
While reading through the materials for fairness, it reminded me on how I debated with my colleagues and bosses on one of the work I did in the past, i.e. equitable in participating fund management.
I will leave what causes the biases and how we could fix these biases in my future post.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by destiawan nur agustra