I have many hidden talents…Problem is I don’t know where they are

Photo by Sebastian Palomino
In my previous post, I have explored how to perform latent class analysis.
However, latent class analysis can be only be used when the variables are categorical. What if the variables are numerical?
In this post, I will be looking at how to perform latent profile analysis (LPA).
Latent Profile Analysis is a statistical modeling approach for estimating distinct profiles of variables (Rosenberg 2021).
There are different model types under LPA.
| Model Type | Variance | Covariance | Also known as |
|---|---|---|---|
| Model 1 | Equal | 0 | EEI |
| Model 2 | Varying | 0 | VVI |
| Model 3 | Equal | Equal | EEE |
| Model 4 | Varying | Equal | |
| Model 5 | Equal | Varying | |
| Model 6 | Varying | Varying | VVV |
Below are the points to note when using LPA:
LPA only works on numeric variables
LPA is sensitive to the scale of the variables
In this demonstration, I will be using tidylpa package in performing latent profile analysis.
pacman::p_load(tidyverse, tidymodels, tidyLPA, GGally)
df <-
read_csv("data/data.csv") %>%
select(-c(`...33`, id))
For simplicity, I will skip the check on the correlations of the variables.
First, I will normalize the numeric variables so that standard deviation of one and mean of zero, otherwise the model results will be affected by the scales of columns.
gen_norm <-
recipe(diagnosis ~ ., data = df) %>%
step_normalize(all_numeric_predictors()) %>%
prep()
df_norm <-
bake(gen_norm, new_data = df)
By default, the function will fit model 1 if any of the arguments on variance, covariance and models are not provided.
lpa_fit_model1 <-
df_norm %>%
select(-diagnosis) %>%
estimate_profiles(n_profiles = 2:6)
lpa_fit_model1
tidyLPA analysis using mclust:
Model Classes AIC BIC Entropy prob_min prob_max n_min
1 2 41655.40 42050.53 0.98 0.99 1.00 0.32
1 3 39182.54 39712.28 0.98 0.97 1.00 0.16
1 4 37785.01 38449.35 0.98 0.96 1.00 0.11
1 5 36923.06 37722.02 0.96 0.95 1.00 0.04
1 6 36005.39 36938.95 0.96 0.95 1.00 0.03
n_max BLRT_p
0.68 0.01
0.64 0.01
0.54 0.01
0.40 0.01
0.31 0.01
From the results, we can note the following:
When we add additional component, the p-value is still less than 0.05, indicating that the additional component brings additional information.
Among all the tested profiles, it seems like 6 classes are the best based on the AIC and BIC results.
We can extract the model performance by using get_fit function.
get_fit(lpa_fit_model1)
# A tibble: 5 × 18
Model Classes LogLik AIC AWE BIC CAIC CLC KIC
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 -20737. 41655. 42899. 42051. 42142. 41475. 41749.
2 1 3 -19469. 39183. 40850. 39712. 39834. 38941. 39308.
3 1 4 -18740. 37785. 39877. 38449. 38602. 37481. 37941.
4 1 5 -18278. 36923. 39439. 37722. 37906. 36557. 37110.
5 1 6 -17788. 36005. 38946. 36939. 37154. 35577. 36223.
# ℹ 9 more variables: SABIC <dbl>, ICL <dbl>, Entropy <dbl>,
# prob_min <dbl>, prob_max <dbl>, n_min <dbl>, n_max <dbl>,
# BLRT_val <dbl>, BLRT_p <dbl>
The descriptions of the different measurements can be found under this link.
We can specify the number of classes to run as follows:
lpa_fit_model1_skip <-
df_norm %>%
select(-diagnosis) %>%
estimate_profiles(n_profiles = c(2, 4, 6))
lpa_fit_model1_skip
tidyLPA analysis using mclust:
Model Classes AIC BIC Entropy prob_min prob_max n_min
1 2 41655.40 42050.53 0.98 0.99 1.00 0.32
1 4 37785.01 38449.35 0.98 0.96 1.00 0.11
1 6 36005.39 36938.95 0.96 0.95 1.00 0.03
n_max BLRT_p
0.68 0.01
0.54 0.01
0.31 0.01
We could extract the selected model by using pluck function.
lpa_fit_model1_skip %>%
pluck("model_1_class_4")
tidyProfile estimated using Mclust:
Model Classes LogLik AIC AWE BIC CAIC
1 4 -18739.51 37785.01 39876.74 38449.35 38602.35
CLC KIC SABIC ICL Entropy prob_min prob_max
37480.97 37941.01 37963.65 -38466.51 0.98 0.96 1.00
n_min n_max BLRT_val BLRT_p
0.11 0.54 1459.53 0.01
We can also change the model we would like to build by either passing relevant info into variance and covariance arguments or indicate under model argument.
lpa_fit_model2 <-
df_norm %>%
select(-diagnosis) %>%
estimate_profiles(n_profiles = 2:6, models = 2)
lpa_fit_model2
tidyLPA analysis using mclust:
Model Classes AIC BIC Entropy prob_min prob_max n_min
2 2 37243.45 37768.85 0.99 1.00 1.00 0.39
2 3 34887.19 35677.46 0.98 0.98 1.00 0.15
2 4 32506.72 33561.85 0.97 0.98 0.99 0.15
2 5 30625.20 31945.20 0.98 0.99 1.00 0.11
2 6 29393.02 30977.89 0.98 0.98 1.00 0.11
n_max BLRT_p
0.61 0.01
0.55 0.01
0.29 0.01
0.26 0.01
0.22 0.01
To visualize the results, we could use plot_profiles function to do so.
Note on the following:
In order to use plot_profiles function to visualize, we will need to convert the necessary data into matrix format.
We can use ggplot functions to further modify the charts.
df_norm %>%
select(-diagnosis) %>%
as.matrix() %>%
estimate_profiles(3) %>%
tidyLPA::plot_profiles() +
coord_flip()


It seems like this chart will be quite hard to read when we have too many variables.
Alternatively, we could use the parallel coordinate plot to illustrate the estimated values under each group. This is probably easier to compare the different under each group.
This can be done by using ggparcoord function from GGally package.
df_norm %>%
select(-diagnosis) %>%
as.matrix() %>%
estimate_profiles(3) %>%
get_estimates() %>%
filter(Category == "Means") %>%
select(c(Estimate, Parameter, Class)) %>%
pivot_wider(names_from = Parameter
,values_from = Estimate) %>%
mutate(Class = as.factor(Class)) %>%
ggparcoord(columns = 2:14, groupColumn = 1) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

It seems like there are some differences in characteristics among the different profiles.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Pixabay