Multinomial Logistic Regression

Machine Learning Supervised Learning

When the target is more than 2 classes

Jasper Lok https://jasperlok.netlify.app/
01-06-2024

Photo by Joshua Ryder on Unsplash

In this post, I will be exploring what is multinomial logistic regression and how to build a multinomial logistic regression model.

What is multinomial logistic regression?

As the name suggested, this model can predict more than 2 classes, which is different from the normal logistic regression.

This would be very handy when the target variable has more than 2 categories and the categories are unordered.

The author of this post gave a summary between multinomial logistic regression and one-vs-all logistic regression (Cross Validated 2013).

Assumptions of multinomial logistic regression

Some of the assumptions of multinomial logistic regression include (National University Academic Success Centre 2024):

Demonstration

According to CRAN, different packages would help us to model multi-state (e.g. mstate, msm).

In this demonstration, I will be using different functions to fit a multinomial regression.

Setup the environment

First, I will load the necessary packages into the environment.

pacman::p_load(tidyverse, tidymodels, palmerpenguins, brulee, VGAM, nnet)

If we were to call the dataset, we would notice that there are some missing values within the dataset.

penguins
# A tibble: 344 x 8
   species island    bill_length~1 bill_~2 flipp~3 body_~4 sex    year
   <fct>   <fct>             <dbl>   <dbl>   <int>   <int> <fct> <int>
 1 Adelie  Torgersen          39.1    18.7     181    3750 male   2007
 2 Adelie  Torgersen          39.5    17.4     186    3800 fema~  2007
 3 Adelie  Torgersen          40.3    18       195    3250 fema~  2007
 4 Adelie  Torgersen          NA      NA        NA      NA <NA>   2007
 5 Adelie  Torgersen          36.7    19.3     193    3450 fema~  2007
 6 Adelie  Torgersen          39.3    20.6     190    3650 male   2007
 7 Adelie  Torgersen          38.9    17.8     181    3625 fema~  2007
 8 Adelie  Torgersen          39.2    19.6     195    4675 male   2007
 9 Adelie  Torgersen          34.1    18.1     193    3475 <NA>   2007
10 Adelie  Torgersen          42      20.2     190    4250 <NA>   2007
# ... with 334 more rows, and abbreviated variable names
#   1: bill_length_mm, 2: bill_depth_mm, 3: flipper_length_mm,
#   4: body_mass_g

For simplicity, I will be imputing the missing values in the dataset by using bagged trees.

gen_recipe <- 
  recipe(species ~ .
         ,data = penguins) %>% 
  step_impute_bag(all_predictors()) %>% 
  prep()

df_imputed <- bake(gen_recipe, penguins)

brulee package

For the first model, I will be using brulee package to fit the model.

mulnorm_specs <-
  multinom_reg() %>% 
  set_engine("brulee")

Then, I define the pre-processing steps as suggested by the authors of Tidy Modeling with R (Kuhn and Silge 2023).

mulnorm_recipe <-
  recipe(species ~ flipper_length_mm + island
         ,data = df_imputed) %>%
  step_nzv(all_predictors()) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_normalize(all_numeric_predictors())

Once that is done, I will “chain” all the different parts into a workflow.

mulnorm_wk <-
  workflow() %>% 
  add_model(mulnorm_specs) %>% 
  add_recipe(mulnorm_recipe)

mulnorm_wk
== Workflow ==========================================================
Preprocessor: Recipe
Model: multinom_reg()

-- Preprocessor ------------------------------------------------------
3 Recipe Steps

* step_nzv()
* step_dummy()
* step_normalize()

-- Model -------------------------------------------------------------
Multinomial Regression Model Specification (classification)

Computational engine: brulee 

I will then fit the model.

mulnorm_fit <-
  mulnorm_wk %>% 
  fit(data = df_imputed)

mulnorm_fit
== Workflow [trained] ================================================
Preprocessor: Recipe
Model: multinom_reg()

-- Preprocessor ------------------------------------------------------
3 Recipe Steps

* step_nzv()
* step_dummy()
* step_normalize()

-- Model -------------------------------------------------------------
Multinomial regression

344 samples, 3 features, 3 classes 
class weights Adelie=1, Chinstrap=1, Gentoo=1 
weight decay: 0.001 
batch size: 310 
validation loss after 2 epochs: 0.225 

I will extract the predicted values by using augment function.

mulnorm_predict <-
  augment(mulnorm_fit, df_imputed)

mulnorm_predict %>% 
  slice_head(n = 10)
# A tibble: 10 x 12
   island  bill_~1 bill_~2 flipp~3 body_~4 sex    year species .pred~5
   <fct>     <dbl>   <dbl>   <int>   <int> <fct> <int> <fct>   <fct>  
 1 Torger~    39.1    18.7     181    3750 male   2007 Adelie  Adelie 
 2 Torger~    39.5    17.4     186    3800 fema~  2007 Adelie  Adelie 
 3 Torger~    40.3    18       195    3250 fema~  2007 Adelie  Adelie 
 4 Torger~    38.0    18.3     187    3547 fema~  2007 Adelie  Adelie 
 5 Torger~    36.7    19.3     193    3450 fema~  2007 Adelie  Adelie 
 6 Torger~    39.3    20.6     190    3650 male   2007 Adelie  Adelie 
 7 Torger~    38.9    17.8     181    3625 fema~  2007 Adelie  Adelie 
 8 Torger~    39.2    19.6     195    4675 male   2007 Adelie  Adelie 
 9 Torger~    34.1    18.1     193    3475 fema~  2007 Adelie  Adelie 
10 Torger~    42      20.2     190    4250 male   2007 Adelie  Adelie 
# ... with 3 more variables: .pred_Adelie <dbl>,
#   .pred_Chinstrap <dbl>, .pred_Gentoo <dbl>, and abbreviated
#   variable names 1: bill_length_mm, 2: bill_depth_mm,
#   3: flipper_length_mm, 4: body_mass_g, 5: .pred_class

yardstick package also offers mn_log_loss function to compute the logarithmic loss of the multinomial logistic model.

mulnorm_predict %>% 
  mn_log_loss(species, 10:12)
# A tibble: 1 x 3
  .metric     .estimator .estimate
  <chr>       <chr>          <dbl>
1 mn_log_loss multiclass     0.231

vglm function from VGAM package

Next, I will use vglm function from VGAM package to build the model.

vglm_fit <- 
  vglm(species ~ flipper_length_mm + island
       ,data = df_imputed
       ,family = multinomial)

vglm_fit

Call:
vglm(formula = species ~ flipper_length_mm + island, family = multinomial, 
    data = df_imputed)


Coefficients:
      (Intercept):1       (Intercept):2 flipper_length_mm:1 
         266.738752          232.014117           -1.313793 
flipper_length_mm:2       islandDream:1       islandDream:2 
          -1.182953           14.639756           24.346370 
  islandTorgersen:1   islandTorgersen:2 
          16.200276           13.866865 

Degrees of Freedom: 688 Total; 680 Residual
Residual deviance: 151.3212 
Log-likelihood: -75.66059 

This is a multinomial logit model with 3 levels

We could pass the fitted object into the summary function to check the results.

summary(vglm_fit)

Call:
vglm(formula = species ~ flipper_length_mm + island, family = multinomial, 
    data = df_imputed)

Coefficients: 
                    Estimate Std. Error z value Pr(>|z|)
(Intercept):1        266.739    228.834   1.166    0.244
(Intercept):2        232.014    229.712   1.010    0.312
flipper_length_mm:1   -1.314      1.127      NA       NA
flipper_length_mm:2   -1.183      1.128  -1.049    0.294
islandDream:1         14.640     17.702   0.827    0.408
islandDream:2         24.346     25.933      NA       NA
islandTorgersen:1     16.200     34.789      NA       NA
islandTorgersen:2     13.867     62.409   0.222    0.824

Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])

Residual deviance: 151.3212 on 680 degrees of freedom

Log-likelihood: -75.6606 on 680 degrees of freedom

Number of Fisher scoring iterations: 18 

Warning: Hauck-Donner effect detected in the following estimate(s):
'flipper_length_mm:1', 'islandDream:2', 'islandTorgersen:1'


Reference group is level  3  of the response

Hmm, it seems there is an issue (i.e., Hauk-Donner effect) in the dataset.

According to this paper, Hauck-Donner effect (HDE) whereby a Wald test statistic is not monotonely increasing as a function of increasing distance between the parameter estimate and the null value.

However, for simplicity, I won’t be exploring what is Hauck-Donner effect and how to fix it.

Alternatively, we could extract the coefficients from the fitted object by using coef.vlm function.

coef.vlm(vglm_fit)
      (Intercept):1       (Intercept):2 flipper_length_mm:1 
         266.738752          232.014117           -1.313793 
flipper_length_mm:2       islandDream:1       islandDream:2 
          -1.182953           14.639756           24.346370 
  islandTorgersen:1   islandTorgersen:2 
          16.200276           13.866865 

The cool thing about this function is that ANOVA function still works on the fitted object from this function.

anova(vglm_fit)
Analysis of Deviance Table (Type II tests)

Model: 'multinomial', 'VGAMcategorical'

Link: 'multilogitlink'

Response: species

                  Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
flipper_length_mm  2   212.69       682     363.95 < 2.2e-16 ***
island             4   136.78       684     288.04 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To extract the predictions, we could use predict.vlm function from the package to do so.

predict.vlm(vglm_fit, newdata = df_imputed, type = "response") %>% 
  as_tibble()
# A tibble: 344 x 2
   `log(mu[,1]/mu[,3])` `log(mu[,2]/mu[,3])`
                  <dbl>                <dbl>
 1                 45.1                 31.8
 2                 38.6                 25.9
 3                 26.7                 15.2
 4                 37.3                 24.7
 5                 29.4                 17.6
 6                 33.3                 21.1
 7                 45.1                 31.8
 8                 26.7                 15.2
 9                 29.4                 17.6
10                 33.3                 21.1
# ... with 334 more rows

multinom function from nnet package

Lastly, I will use nultinom function from nnet package to fit the model.

nnet_fit <-
  multinom(species ~ flipper_length_mm + island
          ,data = df_imputed)
# weights:  15 (8 variable)
initial  value 377.922627 
iter  10 value 90.006785
iter  20 value 77.751017
iter  30 value 76.042914
iter  40 value 75.653015
iter  50 value 75.643330
final  value 75.643211 
converged
nnet_fit
Call:
multinom(formula = species ~ flipper_length_mm + island, data = df_imputed)

Coefficients:
          (Intercept) flipper_length_mm islandDream islandTorgersen
Chinstrap   -39.09934         0.1308451    14.08030       -15.62399
Gentoo     -277.66603         1.3677412   -19.38964      -107.33116

Residual Deviance: 151.2864 
AIC: 167.2864 

According to the documentation, the function uses neutral network to fit a multinomial logistic regression.

As such, some of the existing functions (e.g., ANOVA) would not work on the fitted object.

Note that the formula uses the first factor as the reference group.

We could change the reference level by using relevel function.

df_imputed_relevel <- 
  df_imputed %>% 
  mutate(species = relevel(species, ref = "Gentoo"))

nnet_fit_relevel <-
  multinom(species ~ flipper_length_mm + island
          ,data = df_imputed_relevel)
# weights:  15 (8 variable)
initial  value 377.922627 
iter  10 value 92.441650
iter  20 value 83.835909
iter  30 value 79.695362
iter  40 value 75.911918
iter  50 value 75.681170
final  value 75.623715 
converged
nnet_fit_relevel
Call:
multinom(formula = species ~ flipper_length_mm + island, data = df_imputed_relevel)

Coefficients:
          (Intercept) flipper_length_mm islandDream islandTorgersen
Adelie       319.0516         -1.571624    82.64173        19.42912
Chinstrap    182.1415         -1.440774   194.53175        13.72710

Residual Deviance: 151.2474 
AIC: 167.2474 

From the model, we noted the following:

# display odds ratios in transposed data frame
odds_ratios <- exp(summary(nnet_fit)$coefficients)
data.frame(t(odds_ratios))
                     Chinstrap        Gentoo
(Intercept)       1.045614e-17 2.577365e-121
flipper_length_mm 1.139791e+00  3.926471e+00
islandDream       1.303154e+06  3.794789e-09
islandTorgersen   1.639034e-07  2.435947e-47

Note that the log odds ratio is always calculated to the reference level. If we would like to compare the log ratio of Chinstrap and Gentoo, we will just need to exponential their coefficient difference to obtain the log ratio.

coefs_c_to_b <- summary(nnet_fit)$coefficients[2, ] - 
   summary(nnet_fit)$coefficients[1, ]

data.frame(exp(coefs_c_to_b))
                  exp.coefs_c_to_b.
(Intercept)           2.464929e-104
flipper_length_mm      3.444904e+00
islandDream            2.912002e-15
islandTorgersen        1.486209e-40

We could extract the fitted values by extracting the fitted.values object from the fitted model.

nnet_fit$fitted.values %>% 
  as_tibble()
# A tibble: 344 x 3
   Adelie Chinstrap   Gentoo
    <dbl>     <dbl>    <dbl>
 1   1.00  3.31e-14 2.05e-60
 2   1.00  6.36e-14 1.92e-57
 3   1.00  2.06e-13 4.25e-52
 4   1.00  7.25e-14 7.52e-57
 5   1.00  1.59e-13 2.76e-53
 6   1.00  1.07e-13 4.55e-55
 7   1.00  3.31e-14 2.05e-60
 8   1.00  2.06e-13 4.25e-52
 9   1.00  1.59e-13 2.76e-53
10   1.00  1.07e-13 4.55e-55
# ... with 334 more rows

The result shows the likelihood of the selected penguin belonging to a particular species.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Thomas Denton on Unsplash

Cross Validated. 2013. “Multinomial Logistic Regression Vs One-Vs-Rest Binary Logistic Regression.” https://stats.stackexchange.com/questions/52104/multinomial-logistic-regression-vs-one-vs-rest-binary-logistic-regression.
Kuhn, Max, and Julia Silge. 2023. “A Recommended Preprocessing.” https://www.tmwr.org/pre-proc-table.
National University Academic Success Centre. 2024. “Multinomial Logistic Regression.” https://resources.nu.edu/statsresources/Multinomiallogistic#:~:text=A%20multinomial%20logistic%20regression%20(or,that%20are%20categorical%20or%20continuous.

References