Support Vector Machine

Machine Learning Supervised Learning

Don’t trust stairs, they’re always up to something, but at least they support you step by step.

Jasper Lok https://jasperlok.netlify.app/
05-07-2024

Photo by krakenimages on Unsplash

What is support vector machine?

A support vector machine (SVM) is a supervised machine learning algorithm that classifies data by finding an optimal line or hyperplane that maximizes the distance between each class in an N-dimensional space (IBM 2023).

The author also explained that when the data is not linearly separable, kernel functions are used to transform the data higher-dimensional space to enable linear separation. This application of kernel functions can be known as the “kernel trick”, and the choice of kernel function, such as linear kernels, polynomial kernels, radial basis function (RBF) kernels, or sigmoid kernels, depends on data characteristics and the specific use case.

Different types of kernels

From kernlab package documentation page, below are the different types of kernels and what they are suitable for:

Kernel Type Description
Linear vanilladot Useful specially when dealing with large sparse data (e.g., text categorisation)
Gaussian radial basis function rbfdot General purpose kernel and is typically used when no further prior knowledge is available about the data
Polynomial kernel polydot Image classfication
Hyperbolic tangent kernel Mainly used as a proxy for neural networks
Bessel function besseldot General purpose kernel and is typically used when no further prior knowledge is available about the data
Laplace radial basis kernel laplacedot General purpose kernel and is typically used when no further prior knowledge is available about the data
ANOVA radial basis kernel avovadot Perform well in multidimensional regression problems

Scikit Learn website has some great illustrations on how different kernels work.

Demonstration

In this demonstration, I will be using several methods to fit models.

pacman::p_load(tidyverse, tidymodels, janitor, kernlab, doMC, themis, yardstick, probably)

Import Data

For this demonstration, I will be using attrition data.

df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-03-12-marketbasket/data/general_data.csv") %>%
  # clean up the column naming
  clean_names() %>% 
  # convert the attrition column to the correct column types
  mutate(attrition = as.factor(attrition)) %>% 
  select(c(age
           ,attrition
           ,business_travel
           ,department
           ,job_role
           ,marital_status))

Model Building

Now, let’s start building the models!

I will be exploring using different methods to fit support vector machine.

Kernlab

kernlab_fit <-
  ksvm(attrition ~ .
       ,data = df
       ,prob.model = TRUE)

kernlab_fit
Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 1 

Gaussian Radial Basis kernel function. 
 Hyperparameter : sigma =  0.226411839635851 

Number of Support Vectors : 1603 

Objective Function Value : -1360.046 
Training error : 0.151701 
Probability model included. 

We can change the kernel type and the parameters by passing the necessary information to the arguments.

kernlab_fit_otherParam <-
  ksvm(attrition ~ .
       ,data = df
       ,kernel = "anovadot"
       ,kpar = list(sigma = 1.1, degree = 2)
       ,prob.model = TRUE)
line search fails -0.006275372 0.6794668 1.711631e-05 -1.077301e-07 -1.670093e-12 -7.465143e-11 -2.054361e-17
kernlab_fit_otherParam
Support Vector Machine object of class "ksvm" 

SV type: C-svc  (classification) 
 parameter : cost C = 1 

Anova RBF kernel function. 
 Hyperparameter : sigma =  1.1 degree =  2 

Number of Support Vectors : 1421 

Objective Function Value : -8342.706 
Training error : 0.268027 
Probability model included. 

Below is how we obtain predicted probabilities of each class from fitted SVM model:

head(predict(kernlab_fit, newdata = df, type = "probabilities"))
            No       Yes
[1,] 0.8564525 0.1435475
[2,] 0.8260551 0.1739449
[3,] 0.8567463 0.1432537
[4,] 0.8657787 0.1342213
[5,] 0.7571686 0.2428314
[6,] 0.8584917 0.1415083

If type is not indicated, then predict function will return us the predicted class instead.

head(predict(kernlab_fit, newdata = df))
[1] No No No No No No
Levels: No Yes

To compute the confusion matrix, I will first generate the predictions before passing into conf_mat function.

# prediction
kernlab_pred <-
  tibble(pred = predict(kernlab_fit, newdata = df)) %>% 
  bind_cols(df)

# confusion matrix
conf_mat(kernlab_pred
         ,attrition
         ,pred)
          Truth
Prediction   No  Yes
       No  3684  654
       Yes   15   57

Tidymodels

Next, I will use tidymodels to build SVM.

First, I will set the necessary parameters.

set.seed(1234)

prop_split <- 0.6
grid_num <- 5

registerDoMC(cores = 8)

I will split the data into training and testing dataset.

df_split <- initial_split(df, prop = prop_split, strata = attrition)
df_train <- training(df_split)
df_test <- testing(df_split)

df_folds <- vfold_cv(df_train, strata = attrition)

Then, I will define the data wrangling steps.

gen_recipe <- 
  recipe(attrition ~., data = df_train) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_predictors()) %>% 
  step_corr(all_numeric_predictors()) %>% 
  step_smote(attrition)

I will also define the model I want to build.

svm_spec <-
  svm_linear(cost = tune()
             ,margin = tune()) %>% 
  set_engine("kernlab") %>% 
  set_mode("classification")
svm_workflow <-
  workflow() %>% 
  add_recipe(gen_recipe) %>% 
  add_model(svm_spec)

After that, I will tune the parameters of the model.

svm_tune <-
  tune_grid(svm_workflow
            ,resample = df_folds
            ,grid = grid_num)

I will pick the best parameters based on ROC.

svm_fit <-
  svm_workflow %>% 
  finalize_workflow(select_best(svm_tune)) %>% 
  last_fit(df_split)

We can extract performance metric from the fitted model.

svm_fit$.metrics
[[1]]
# A tibble: 3 × 4
  .metric     .estimator .estimate .config             
  <chr>       <chr>          <dbl> <chr>               
1 accuracy    binary         0.671 Preprocessor1_Model1
2 roc_auc     binary         0.685 Preprocessor1_Model1
3 brier_class binary         0.235 Preprocessor1_Model1

To calculate the confusion matrix, I will generate the predictions before passing the results to conf_mat function.

svm_pred <-
  svm_fit %>% 
  collect_predictions()
conf_mat(svm_pred
         ,attrition
         ,`.pred_class`)
          Truth
Prediction   No  Yes
       No  1035  136
       Yes  445  149

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Riccardo Annandale on Unsplash

IBM. 2023. “What Are Support Vector Machines (SVMs)?” https://www.ibm.com/topics/support-vector-machine.

References