Latent Profile Analysis

Principal component analysis

I have many hidden talents…Problem is I don’t know where they are

Jasper Lok https://jasperlok.netlify.app/
01-05-2025

Photo by Sebastian Palomino

In my previous post, I have explored how to perform latent class analysis.

However, latent class analysis can be only be used when the variables are categorical. What if the variables are numerical?

In this post, I will be looking at how to perform latent profile analysis (LPA).

What is latent profile analysis (LPA)?

Latent Profile Analysis is a statistical modeling approach for estimating distinct profiles of variables (Rosenberg 2021).

There are different model types under LPA.

Model Type Variance Covariance Also known as
Model 1 Equal 0 EEI
Model 2 Varying 0 VVI
Model 3 Equal Equal EEE
Model 4 Varying Equal
Model 5 Equal Varying
Model 6 Varying Varying VVV

Important notes when using latent profile analysis

Below are the points to note when using LPA:

Demonstration

In this demonstration, I will be using tidylpa package in performing latent profile analysis.

pacman::p_load(tidyverse, tidymodels, tidyLPA, GGally)

Import Data

df <- 
  read_csv("data/data.csv") %>% 
  select(-c(`...33`, id))

For simplicity, I will skip the check on the correlations of the variables.

First, I will normalize the numeric variables so that standard deviation of one and mean of zero, otherwise the model results will be affected by the scales of columns.

gen_norm <-
  recipe(diagnosis ~ ., data = df) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  prep()

df_norm <-
  bake(gen_norm, new_data = df)

By default, the function will fit model 1 if any of the arguments on variance, covariance and models are not provided.

lpa_fit_model1 <-
  df_norm %>%
  select(-diagnosis) %>% 
  estimate_profiles(n_profiles = 2:6)

lpa_fit_model1
tidyLPA analysis using mclust: 

 Model Classes AIC      BIC      Entropy prob_min prob_max n_min
 1     2       41655.40 42050.53 0.98    0.99     1.00     0.32 
 1     3       39182.54 39712.28 0.98    0.97     1.00     0.16 
 1     4       37785.01 38449.35 0.98    0.96     1.00     0.11 
 1     5       36923.06 37722.02 0.96    0.95     1.00     0.04 
 1     6       36005.39 36938.95 0.96    0.95     1.00     0.03 
 n_max BLRT_p
 0.68  0.01  
 0.64  0.01  
 0.54  0.01  
 0.40  0.01  
 0.31  0.01  

From the results, we can note the following:

We can extract the model performance by using get_fit function.

get_fit(lpa_fit_model1)
# A tibble: 5 × 18
  Model Classes  LogLik    AIC    AWE    BIC   CAIC    CLC    KIC
  <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1       2 -20737. 41655. 42899. 42051. 42142. 41475. 41749.
2     1       3 -19469. 39183. 40850. 39712. 39834. 38941. 39308.
3     1       4 -18740. 37785. 39877. 38449. 38602. 37481. 37941.
4     1       5 -18278. 36923. 39439. 37722. 37906. 36557. 37110.
5     1       6 -17788. 36005. 38946. 36939. 37154. 35577. 36223.
# ℹ 9 more variables: SABIC <dbl>, ICL <dbl>, Entropy <dbl>,
#   prob_min <dbl>, prob_max <dbl>, n_min <dbl>, n_max <dbl>,
#   BLRT_val <dbl>, BLRT_p <dbl>

The descriptions of the different measurements can be found under this link.

We can specify the number of classes to run as follows:

lpa_fit_model1_skip <-
  df_norm %>%
  select(-diagnosis) %>% 
  estimate_profiles(n_profiles = c(2, 4, 6))

lpa_fit_model1_skip
tidyLPA analysis using mclust: 

 Model Classes AIC      BIC      Entropy prob_min prob_max n_min
 1     2       41655.40 42050.53 0.98    0.99     1.00     0.32 
 1     4       37785.01 38449.35 0.98    0.96     1.00     0.11 
 1     6       36005.39 36938.95 0.96    0.95     1.00     0.03 
 n_max BLRT_p
 0.68  0.01  
 0.54  0.01  
 0.31  0.01  

We could extract the selected model by using pluck function.

lpa_fit_model1_skip %>% 
  pluck("model_1_class_4")
tidyProfile estimated using Mclust: 

    Model   Classes    LogLik       AIC       AWE       BIC      CAIC 
        1         4 -18739.51  37785.01  39876.74  38449.35  38602.35 
      CLC       KIC     SABIC       ICL   Entropy  prob_min  prob_max 
 37480.97  37941.01  37963.65 -38466.51      0.98      0.96      1.00 
    n_min     n_max  BLRT_val    BLRT_p 
     0.11      0.54   1459.53      0.01 

We can also change the model we would like to build by either passing relevant info into variance and covariance arguments or indicate under model argument.

lpa_fit_model2 <-
  df_norm %>%
  select(-diagnosis) %>% 
  estimate_profiles(n_profiles = 2:6, models = 2)

lpa_fit_model2
tidyLPA analysis using mclust: 

 Model Classes AIC      BIC      Entropy prob_min prob_max n_min
 2     2       37243.45 37768.85 0.99    1.00     1.00     0.39 
 2     3       34887.19 35677.46 0.98    0.98     1.00     0.15 
 2     4       32506.72 33561.85 0.97    0.98     0.99     0.15 
 2     5       30625.20 31945.20 0.98    0.99     1.00     0.11 
 2     6       29393.02 30977.89 0.98    0.98     1.00     0.11 
 n_max BLRT_p
 0.61  0.01  
 0.55  0.01  
 0.29  0.01  
 0.26  0.01  
 0.22  0.01  

To visualize the results, we could use plot_profiles function to do so.

Note on the following:

 df_norm %>%
  select(-diagnosis) %>%
  as.matrix() %>% 
  estimate_profiles(3) %>% 
  tidyLPA::plot_profiles() +
  coord_flip()

It seems like this chart will be quite hard to read when we have too many variables.

Alternatively, we could use the parallel coordinate plot to illustrate the estimated values under each group. This is probably easier to compare the different under each group.

This can be done by using ggparcoord function from GGally package.

 df_norm %>%
  select(-diagnosis) %>%
  as.matrix() %>% 
  estimate_profiles(3) %>% 
  get_estimates() %>%
  filter(Category == "Means") %>% 
  select(c(Estimate, Parameter, Class)) %>% 
  pivot_wider(names_from = Parameter
              ,values_from = Estimate) %>%
  mutate(Class = as.factor(Class)) %>% 
  ggparcoord(columns = 2:14, groupColumn = 1) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

It seems like there are some differences in characteristics among the different profiles.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Pixabay

Rosenberg, Joshua M. 2021. “Introduction to tidyLPA.” https://cran.r-project.org/web/packages/tidyLPA/vignettes/Introduction_to_tidyLPA.html.

References