When the target is more than 2 classes

Photo by Joshua Ryder on Unsplash
In this post, I will be exploring what is multinomial logistic regression and how to build a multinomial logistic regression model.
As the name suggested, this model can predict more than 2 classes, which is different from the normal logistic regression.
This would be very handy when the target variable has more than 2 categories and the categories are unordered.
The author of this post gave a summary between multinomial logistic regression and one-vs-all logistic regression (Cross Validated 2013).
Some of the assumptions of multinomial logistic regression include (National University Academic Success Centre 2024):
Independence of observations
Categories of the outcome variable must be mutually exclusive and exhaustive
No multicollinearity between independent variables
Linear relationship between continuous variables and the logit transformation of the outcome variable
No outliers or highly influential points
According to CRAN, different packages would help us to model multi-state (e.g. mstate, msm).
In this demonstration, I will be using different functions to fit a multinomial regression.
First, I will load the necessary packages into the environment.
pacman::p_load(tidyverse, tidymodels, palmerpenguins, brulee, VGAM, nnet)
If we were to call the dataset, we would notice that there are some missing values within the dataset.
penguins
# A tibble: 344 x 8
species island bill_length~1 bill_~2 flipp~3 body_~4 sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema~ 2007
3 Adelie Torgersen 40.3 18 195 3250 fema~ 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 fema~ 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
7 Adelie Torgersen 38.9 17.8 181 3625 fema~ 2007
8 Adelie Torgersen 39.2 19.6 195 4675 male 2007
9 Adelie Torgersen 34.1 18.1 193 3475 <NA> 2007
10 Adelie Torgersen 42 20.2 190 4250 <NA> 2007
# ... with 334 more rows, and abbreviated variable names
# 1: bill_length_mm, 2: bill_depth_mm, 3: flipper_length_mm,
# 4: body_mass_g
For simplicity, I will be imputing the missing values in the dataset by using bagged trees.
gen_recipe <-
recipe(species ~ .
,data = penguins) %>%
step_impute_bag(all_predictors()) %>%
prep()
df_imputed <- bake(gen_recipe, penguins)
brulee packageFor the first model, I will be using brulee package to fit the model.
mulnorm_specs <-
multinom_reg() %>%
set_engine("brulee")
Then, I define the pre-processing steps as suggested by the authors of Tidy Modeling with R (Kuhn and Silge 2023).
mulnorm_recipe <-
recipe(species ~ flipper_length_mm + island
,data = df_imputed) %>%
step_nzv(all_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_normalize(all_numeric_predictors())
Once that is done, I will “chain” all the different parts into a workflow.
mulnorm_wk <-
workflow() %>%
add_model(mulnorm_specs) %>%
add_recipe(mulnorm_recipe)
mulnorm_wk
== Workflow ==========================================================
Preprocessor: Recipe
Model: multinom_reg()
-- Preprocessor ------------------------------------------------------
3 Recipe Steps
* step_nzv()
* step_dummy()
* step_normalize()
-- Model -------------------------------------------------------------
Multinomial Regression Model Specification (classification)
Computational engine: brulee
I will then fit the model.
mulnorm_fit <-
mulnorm_wk %>%
fit(data = df_imputed)
mulnorm_fit
== Workflow [trained] ================================================
Preprocessor: Recipe
Model: multinom_reg()
-- Preprocessor ------------------------------------------------------
3 Recipe Steps
* step_nzv()
* step_dummy()
* step_normalize()
-- Model -------------------------------------------------------------
Multinomial regression
344 samples, 3 features, 3 classes
class weights Adelie=1, Chinstrap=1, Gentoo=1
weight decay: 0.001
batch size: 310
validation loss after 2 epochs: 0.225
I will extract the predicted values by using augment function.
mulnorm_predict <-
augment(mulnorm_fit, df_imputed)
mulnorm_predict %>%
slice_head(n = 10)
# A tibble: 10 x 12
island bill_~1 bill_~2 flipp~3 body_~4 sex year species .pred~5
<fct> <dbl> <dbl> <int> <int> <fct> <int> <fct> <fct>
1 Torger~ 39.1 18.7 181 3750 male 2007 Adelie Adelie
2 Torger~ 39.5 17.4 186 3800 fema~ 2007 Adelie Adelie
3 Torger~ 40.3 18 195 3250 fema~ 2007 Adelie Adelie
4 Torger~ 38.0 18.3 187 3547 fema~ 2007 Adelie Adelie
5 Torger~ 36.7 19.3 193 3450 fema~ 2007 Adelie Adelie
6 Torger~ 39.3 20.6 190 3650 male 2007 Adelie Adelie
7 Torger~ 38.9 17.8 181 3625 fema~ 2007 Adelie Adelie
8 Torger~ 39.2 19.6 195 4675 male 2007 Adelie Adelie
9 Torger~ 34.1 18.1 193 3475 fema~ 2007 Adelie Adelie
10 Torger~ 42 20.2 190 4250 male 2007 Adelie Adelie
# ... with 3 more variables: .pred_Adelie <dbl>,
# .pred_Chinstrap <dbl>, .pred_Gentoo <dbl>, and abbreviated
# variable names 1: bill_length_mm, 2: bill_depth_mm,
# 3: flipper_length_mm, 4: body_mass_g, 5: .pred_class
yardstick package also offers mn_log_loss function to compute the logarithmic loss of the multinomial logistic model.
mulnorm_predict %>%
mn_log_loss(species, 10:12)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 mn_log_loss multiclass 0.231
vglm function from VGAM packageNext, I will use vglm function from VGAM package to build the model.
vglm_fit <-
vglm(species ~ flipper_length_mm + island
,data = df_imputed
,family = multinomial)
vglm_fit
Call:
vglm(formula = species ~ flipper_length_mm + island, family = multinomial,
data = df_imputed)
Coefficients:
(Intercept):1 (Intercept):2 flipper_length_mm:1
266.738752 232.014117 -1.313793
flipper_length_mm:2 islandDream:1 islandDream:2
-1.182953 14.639756 24.346370
islandTorgersen:1 islandTorgersen:2
16.200276 13.866865
Degrees of Freedom: 688 Total; 680 Residual
Residual deviance: 151.3212
Log-likelihood: -75.66059
This is a multinomial logit model with 3 levels
We could pass the fitted object into the summary function to check the results.
summary(vglm_fit)
Call:
vglm(formula = species ~ flipper_length_mm + island, family = multinomial,
data = df_imputed)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 266.739 228.834 1.166 0.244
(Intercept):2 232.014 229.712 1.010 0.312
flipper_length_mm:1 -1.314 1.127 NA NA
flipper_length_mm:2 -1.183 1.128 -1.049 0.294
islandDream:1 14.640 17.702 0.827 0.408
islandDream:2 24.346 25.933 NA NA
islandTorgersen:1 16.200 34.789 NA NA
islandTorgersen:2 13.867 62.409 0.222 0.824
Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])
Residual deviance: 151.3212 on 680 degrees of freedom
Log-likelihood: -75.6606 on 680 degrees of freedom
Number of Fisher scoring iterations: 18
Warning: Hauck-Donner effect detected in the following estimate(s):
'flipper_length_mm:1', 'islandDream:2', 'islandTorgersen:1'
Reference group is level 3 of the response
Hmm, it seems there is an issue (i.e., Hauk-Donner effect) in the dataset.
According to this paper, Hauck-Donner effect (HDE) whereby a Wald test statistic is not monotonely increasing as a function of increasing distance between the parameter estimate and the null value.
However, for simplicity, I won’t be exploring what is Hauck-Donner effect and how to fix it.
Alternatively, we could extract the coefficients from the fitted object by using coef.vlm function.
coef.vlm(vglm_fit)
(Intercept):1 (Intercept):2 flipper_length_mm:1
266.738752 232.014117 -1.313793
flipper_length_mm:2 islandDream:1 islandDream:2
-1.182953 14.639756 24.346370
islandTorgersen:1 islandTorgersen:2
16.200276 13.866865
The cool thing about this function is that ANOVA function still works on the fitted object from this function.
anova(vglm_fit)
Analysis of Deviance Table (Type II tests)
Model: 'multinomial', 'VGAMcategorical'
Link: 'multilogitlink'
Response: species
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
flipper_length_mm 2 212.69 682 363.95 < 2.2e-16 ***
island 4 136.78 684 288.04 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
To extract the predictions, we could use predict.vlm function from the package to do so.
predict.vlm(vglm_fit, newdata = df_imputed, type = "response") %>%
as_tibble()
# A tibble: 344 x 2
`log(mu[,1]/mu[,3])` `log(mu[,2]/mu[,3])`
<dbl> <dbl>
1 45.1 31.8
2 38.6 25.9
3 26.7 15.2
4 37.3 24.7
5 29.4 17.6
6 33.3 21.1
7 45.1 31.8
8 26.7 15.2
9 29.4 17.6
10 33.3 21.1
# ... with 334 more rows
multinom function from nnet packageLastly, I will use nultinom function from nnet package to fit the model.
nnet_fit <-
multinom(species ~ flipper_length_mm + island
,data = df_imputed)
# weights: 15 (8 variable)
initial value 377.922627
iter 10 value 90.006785
iter 20 value 77.751017
iter 30 value 76.042914
iter 40 value 75.653015
iter 50 value 75.643330
final value 75.643211
converged
nnet_fit
Call:
multinom(formula = species ~ flipper_length_mm + island, data = df_imputed)
Coefficients:
(Intercept) flipper_length_mm islandDream islandTorgersen
Chinstrap -39.09934 0.1308451 14.08030 -15.62399
Gentoo -277.66603 1.3677412 -19.38964 -107.33116
Residual Deviance: 151.2864
AIC: 167.2864
According to the documentation, the function uses neutral network to fit a multinomial logistic regression.
As such, some of the existing functions (e.g., ANOVA) would not work on the fitted object.
Note that the formula uses the first factor as the reference group.
We could change the reference level by using relevel function.
df_imputed_relevel <-
df_imputed %>%
mutate(species = relevel(species, ref = "Gentoo"))
nnet_fit_relevel <-
multinom(species ~ flipper_length_mm + island
,data = df_imputed_relevel)
# weights: 15 (8 variable)
initial value 377.922627
iter 10 value 92.441650
iter 20 value 83.835909
iter 30 value 79.695362
iter 40 value 75.911918
iter 50 value 75.681170
final value 75.623715
converged
nnet_fit_relevel
Call:
multinom(formula = species ~ flipper_length_mm + island, data = df_imputed_relevel)
Coefficients:
(Intercept) flipper_length_mm islandDream islandTorgersen
Adelie 319.0516 -1.571624 82.64173 19.42912
Chinstrap 182.1415 -1.440774 194.53175 13.72710
Residual Deviance: 151.2474
AIC: 167.2474
From the model, we noted the following:
One unit increase in flipper length increases the log odd of Chinstrap vs Adelie and Gentoo vs Adelie by 1.14 and 3.93 respectively
If the penguins are from Dream Island, the species of penguins would be very likely to be Chinstrap
If the penguins are from Torgersen, the species of penguins are likely to be Adelie
# display odds ratios in transposed data frame
odds_ratios <- exp(summary(nnet_fit)$coefficients)
data.frame(t(odds_ratios))
Chinstrap Gentoo
(Intercept) 1.045614e-17 2.577365e-121
flipper_length_mm 1.139791e+00 3.926471e+00
islandDream 1.303154e+06 3.794789e-09
islandTorgersen 1.639034e-07 2.435947e-47
Note that the log odds ratio is always calculated to the reference level. If we would like to compare the log ratio of Chinstrap and Gentoo, we will just need to exponential their coefficient difference to obtain the log ratio.
coefs_c_to_b <- summary(nnet_fit)$coefficients[2, ] -
summary(nnet_fit)$coefficients[1, ]
data.frame(exp(coefs_c_to_b))
exp.coefs_c_to_b.
(Intercept) 2.464929e-104
flipper_length_mm 3.444904e+00
islandDream 2.912002e-15
islandTorgersen 1.486209e-40
We could extract the fitted values by extracting the fitted.values object from the fitted model.
nnet_fit$fitted.values %>%
as_tibble()
# A tibble: 344 x 3
Adelie Chinstrap Gentoo
<dbl> <dbl> <dbl>
1 1.00 3.31e-14 2.05e-60
2 1.00 6.36e-14 1.92e-57
3 1.00 2.06e-13 4.25e-52
4 1.00 7.25e-14 7.52e-57
5 1.00 1.59e-13 2.76e-53
6 1.00 1.07e-13 4.55e-55
7 1.00 3.31e-14 2.05e-60
8 1.00 2.06e-13 4.25e-52
9 1.00 1.59e-13 2.76e-53
10 1.00 1.07e-13 4.55e-55
# ... with 334 more rows
The result shows the likelihood of the selected penguin belonging to a particular species.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Thomas Denton on Unsplash