Recurrent Events

Machine Learning Survival Model

When its no longer a one-off event

Jasper Lok https://jasperlok.netlify.app/
01-19-2024

In this post, I will be exploring the survival techniques that can be used to analyze recurrent events.

Photo by Chris Curry on Unsplash

Why do we need separate techniques for recurrent events?

The traditional survival assumes the event of interest can only happen once.

However, some of the event of interests could happen more than once. For example, the cancer survivors could have relapsed.

Therefore, in this post, I will be exploring how to analyze recurrent events.

Different types of recurrent techniques

Below are the comparisons of the different models:

Techniques Assumptions Remarks
Counting Process Assume each event is independent from one another and the u=subject continue contributing to the risk set as long as the subject is under observation at the time the event occurs
  • Also known as ‘Anderson-Gill model’
    - Extension of Cox model
Conditional Model A Assume its not possible to be at risk for a subsequent event without having experiened the previous event
  • Also known as ‘Prentice-Williams-Peterson counting process’ or ‘PWP-CP’
    - Similar to counting process, but stratfied by event
    - Use this if the order of the event is important
Conditional Model B Very useful for modelling the time between each recurrent event rather than the full time course of the recurrent event process
  • Also known as ‘Prentice-Williams-Peterson gap process’ or ‘PWP-GT’
    - Similar to conditional model A, but start time is the study entry time
    - Use this if the interest is time to next recurrent event
Marginal Model Consider each event separately and models all the available data for the specific event
  • Strata by event + cluster by id
    - Use this if we want to know the events occuring at different order
    - Strata by event + cluster by id - Each observation must have equal number of rows

With the different types of recurrent models, it can be quite confusing which models we should be using.

This author summarizes a few key considerations in guiding us in picking the appropriate models (Ghilotti and Bellocco 2018):

The authors of this article have provided how the codes would look like under the different models (Guo et al. 2009).

What is the difference between recurrent survival analysis and Poisson regression?

(Yang et al. 2017) explained following are the differences:

The author also highlighted one should be careful when using Poisson regression to analyze recurrent events because of the strong parametric assumptions.

Demonstration

In this demonstration, I will be using the bladder dataset in survival package for the exploration.

Setup the environment

First, I will load the necessary packages into the environment.

pacman::p_load(tidyverse, janitor, survival, survminer)

Import Data

First I will import the dataset into the environment.

I will also clean the column names, drop the columns I don’t need and transform the columns to be the right format.

data(cancer, package = "survival")

df <- bladder1
df_1 <- df %>% 
  mutate(event = if_else(recur == 0, 0, 1))

Counting Process

First, I will build a counting process.

Under this model, each event is assumed to be independent.

recur_fit <-
  coxph(Surv(start, stop, event) ~ treatment + status + cluster(id)
        ,data = df_1)
summary(recur_fit)
Call:
coxph(formula = Surv(start, stop, event) ~ treatment + status, 
    data = df_1, cluster = id)

  n= 292, number of events= 238 
   (2 observations deleted due to missingness)

                        coef exp(coef) se(coef) robust se      z
treatmentpyridoxine -0.04517   0.95584  0.15581   0.27931 -0.162
treatmentthiotepa   -0.37919   0.68441  0.16230   0.27602 -1.374
status               0.28198   1.32576  0.06202   0.10256  2.750
                    Pr(>|z|)   
treatmentpyridoxine  0.87154   
treatmentthiotepa    0.16950   
status               0.00597 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                    exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine    0.9558     1.0462    0.5529     1.652
treatmentthiotepa      0.6844     1.4611    0.3985     1.176
status                 1.3258     0.7543    1.0843     1.621

Concordance= 0.6  (se = 0.032 )
Likelihood ratio test= 22.92  on 3 df,   p=4e-05
Wald test            = 9.77  on 3 df,   p=0.02
Score (logrank) test = 24.44  on 3 df,   p=2e-05,   Robust = 10.14  p=0.02

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit, data = df_1))

Conditional Model A

Next, I will be building a conditional model A.

Under this model, we assume that it is not possible to be at risk for subsequent event without experiencing the prior event.

recur_fit_conditional_a <-
  coxph(Surv(start, stop, event) ~ treatment + status + cluster(id) + strata(enum)
        ,data = df_1)
summary(recur_fit_conditional_a)
Call:
coxph(formula = Surv(start, stop, event) ~ treatment + status + 
    strata(enum), data = df_1, cluster = id)

  n= 292, number of events= 238 
   (2 observations deleted due to missingness)

                        coef exp(coef) se(coef) robust se      z
treatmentpyridoxine  0.17486   1.19108  0.17527   0.17351  1.008
treatmentthiotepa   -0.07643   0.92642  0.17643   0.15982 -0.478
status               0.33499   1.39793  0.08287   0.08425  3.976
                    Pr(>|z|)    
treatmentpyridoxine    0.314    
treatmentthiotepa      0.632    
status                 7e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                    exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine    1.1911     0.8396    0.8477     1.674
treatmentthiotepa      0.9264     1.0794    0.6773     1.267
status                 1.3979     0.7153    1.1851     1.649

Concordance= 0.659  (se = 0.033 )
Likelihood ratio test= 16.16  on 3 df,   p=0.001
Wald test            = 17.6  on 3 df,   p=5e-04
Score (logrank) test = 17.66  on 3 df,   p=5e-04,   Robust = 18.38  p=4e-04

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_conditional_a, data = df_1))

Conditional Model B

I will build a conditional model B.

recur_fit_conditional_b <-
  coxph(Surv(stop - start, event) ~ treatment + status + cluster(id) + strata(enum)
        ,data = df_1)
summary(recur_fit_conditional_b)
Call:
coxph(formula = Surv(stop - start, event) ~ treatment + status + 
    strata(enum), data = df_1, cluster = id)

  n= 294, number of events= 238 

                        coef exp(coef) se(coef) robust se      z
treatmentpyridoxine  0.23066   1.25943  0.16623   0.15649  1.474
treatmentthiotepa   -0.00393   0.99608  0.16832   0.16334 -0.024
status               0.32238   1.38041  0.07826   0.07818  4.124
                    Pr(>|z|)    
treatmentpyridoxine    0.140    
treatmentthiotepa      0.981    
status              3.73e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                    exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine    1.2594     0.7940    0.9268     1.712
treatmentthiotepa      0.9961     1.0039    0.7232     1.372
status                 1.3804     0.7244    1.1843     1.609

Concordance= 0.641  (se = 0.029 )
Likelihood ratio test= 17.51  on 3 df,   p=6e-04
Wald test            = 20.29  on 3 df,   p=1e-04
Score (logrank) test = 19.01  on 3 df,   p=3e-04,   Robust = 18.95  p=3e-04

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_conditional_b, data = df_1))

Marginal model

Lastly, I will build a marginal model.

Under this approach, each event is assumed to be a separate process.

recur_fit_marginal <-
  coxph(Surv(stop, event) ~ treatment + status + cluster(id) + strata(enum)
        ,data = df_1)
summary(recur_fit_marginal)
Call:
coxph(formula = Surv(stop, event) ~ treatment + status + strata(enum), 
    data = df_1, cluster = id)

  n= 294, number of events= 238 

                        coef exp(coef) se(coef) robust se      z
treatmentpyridoxine  0.27926   1.32215  0.16723   0.24330  1.148
treatmentthiotepa   -0.19472   0.82306  0.16745   0.19892 -0.979
status               0.36416   1.43930  0.07790   0.07703  4.727
                    Pr(>|z|)    
treatmentpyridoxine    0.251    
treatmentthiotepa      0.328    
status              2.27e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

                    exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine    1.3222     0.7563    0.8207     2.130
treatmentthiotepa      0.8231     1.2150    0.5573     1.215
status                 1.4393     0.6948    1.2376     1.674

Concordance= 0.643  (se = 0.03 )
Likelihood ratio test= 25.18  on 3 df,   p=1e-05
Wald test            = 25.71  on 3 df,   p=1e-05
Score (logrank) test = 27.32  on 3 df,   p=5e-06,   Robust = 23.44  p=3e-05

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_marginal, data = df_1))

Hmmm, it seems like the coefficients of the different models are different. It made me curious how different they are.

To compare the different coefficients, I will first compile the results.

recurrent_model_coef_df <- 
  bind_rows(
    # counting process
    data.frame(summary(recur_fit)$coefficients) %>% 
      rownames_to_column("variable") %>% 
      clean_names() %>% 
      mutate(model = "counting process")
    # conditional model A
    ,data.frame(summary(recur_fit_conditional_a)$coefficients) %>% 
      rownames_to_column("variable") %>% 
      clean_names() %>% 
      mutate(model = "conditional model a")
    # conditional model B
    ,data.frame(summary(recur_fit_conditional_b)$coefficients) %>% 
      rownames_to_column("variable") %>% 
      clean_names() %>% 
      mutate(model = "conditional model b")
    # marginal model
    ,data.frame(summary(recur_fit_marginal)$coefficients) %>% 
      rownames_to_column("variable") %>% 
      clean_names() %>% 
      mutate(model = "marginal model")
  )

Then, I will compare the results by using ggplot functions.

recurrent_model_coef_df %>% 
  ggplot(aes(variable, coef, color = model)) +
  geom_point() +
  geom_errorbar(aes(ymin = coef - robust_se
                    ,ymax = coef + robust_se)) +
  xlab("Variables") +
  ylab("Coefficients") +
  labs(title = "Coefficients from Different Recurring Models")

Over here, I hasve used the robust errors in calculating the confidence interval as robust error terms would have account for the correlation.

This is because an individual could have multiple events and this measurement will account for the correlation.

Based on the graph, we could see that the confidence interval range for the coefficients overlap with one another.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Brett Jordan on Unsplash

Ghilotti, Francesca, and Rino Bellocco. 2018. “Recurrent-Event Analysis with Stata: Methods and Applications.” https://www.stata.com/meeting/italy18/slides/italy18_Ghilott.pdf.
Guo, Zhenchao, Thomas M. Gill, Heather G., and Allore. 2009. “Modeling Repeated Time-to-Event Health Conditions with Discontinuous Risk Intervals: An Example of a Longitudinal Study of Functional Disability Among Older Persons.” https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735569/.
Yang, Wei, Christopher Jepson, Dawei Xie, Jason A. Roy, Haochang Shou, Jesse Yenchih Hsu, Amanda Hyre Anderson, et al. 2017. “Statistical Methods for Recurrent Event Analysis in Cohort Studies of CKD.” https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5718286/.

References