When its no longer a one-off event
In this post, I will be exploring the survival techniques that can be used to analyze recurrent events.

Photo by Chris Curry on Unsplash
The traditional survival assumes the event of interest can only happen once.
However, some of the event of interests could happen more than once. For example, the cancer survivors could have relapsed.
Therefore, in this post, I will be exploring how to analyze recurrent events.
Below are the comparisons of the different models:
| Techniques | Assumptions | Remarks |
|---|---|---|
| Counting Process | Assume each event is independent from one another and the u=subject continue contributing to the risk set as long as the subject is under observation at the time the event occurs |
|
| Conditional Model A | Assume its not possible to be at risk for a subsequent event without having experiened the previous event |
|
| Conditional Model B | Very useful for modelling the time between each recurrent event rather than the full time course of the recurrent event process |
|
| Marginal Model | Consider each event separately and models all the available data for the specific event |
|
With the different types of recurrent models, it can be quite confusing which models we should be using.
This author summarizes a few key considerations in guiding us in picking the appropriate models (Ghilotti and Bellocco 2018):
Is the order of the events important?
Does the risk of recurrent event change as the number of previous events increases?
Are we interested in the overall effect or in the effect for the \(k^{th}\) event?
Are there many recurrences per subject?
The authors of this article have provided how the codes would look like under the different models (Guo et al. 2009).
(Yang et al. 2017) explained following are the differences:
Poisson regression is a fully parametric model, where Cox model is semi-parametric
If the survival time follows an exponential distribution, then Poisson regression is equivalent to a parametric survival model assuming exponential distribution
If survival time is not exponentially distributed and the event count does not follow a Poisson distribution, the inference made from Poisson regression may be biased
The author also highlighted one should be careful when using Poisson regression to analyze recurrent events because of the strong parametric assumptions.
In this demonstration, I will be using the bladder dataset in survival package for the exploration.
First, I will load the necessary packages into the environment.
pacman::p_load(tidyverse, janitor, survival, survminer)
First I will import the dataset into the environment.
I will also clean the column names, drop the columns I don’t need and transform the columns to be the right format.
data(cancer, package = "survival")
df <- bladder1
df_1 <- df %>%
mutate(event = if_else(recur == 0, 0, 1))
First, I will build a counting process.
Under this model, each event is assumed to be independent.
recur_fit <-
coxph(Surv(start, stop, event) ~ treatment + status + cluster(id)
,data = df_1)
summary(recur_fit)
Call:
coxph(formula = Surv(start, stop, event) ~ treatment + status,
data = df_1, cluster = id)
n= 292, number of events= 238
(2 observations deleted due to missingness)
coef exp(coef) se(coef) robust se z
treatmentpyridoxine -0.04517 0.95584 0.15581 0.27931 -0.162
treatmentthiotepa -0.37919 0.68441 0.16230 0.27602 -1.374
status 0.28198 1.32576 0.06202 0.10256 2.750
Pr(>|z|)
treatmentpyridoxine 0.87154
treatmentthiotepa 0.16950
status 0.00597 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine 0.9558 1.0462 0.5529 1.652
treatmentthiotepa 0.6844 1.4611 0.3985 1.176
status 1.3258 0.7543 1.0843 1.621
Concordance= 0.6 (se = 0.032 )
Likelihood ratio test= 22.92 on 3 df, p=4e-05
Wald test = 9.77 on 3 df, p=0.02
Score (logrank) test = 24.44 on 3 df, p=2e-05, Robust = 10.14 p=0.02
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit, data = df_1))

Next, I will be building a conditional model A.
Under this model, we assume that it is not possible to be at risk for subsequent event without experiencing the prior event.
recur_fit_conditional_a <-
coxph(Surv(start, stop, event) ~ treatment + status + cluster(id) + strata(enum)
,data = df_1)
summary(recur_fit_conditional_a)
Call:
coxph(formula = Surv(start, stop, event) ~ treatment + status +
strata(enum), data = df_1, cluster = id)
n= 292, number of events= 238
(2 observations deleted due to missingness)
coef exp(coef) se(coef) robust se z
treatmentpyridoxine 0.17486 1.19108 0.17527 0.17351 1.008
treatmentthiotepa -0.07643 0.92642 0.17643 0.15982 -0.478
status 0.33499 1.39793 0.08287 0.08425 3.976
Pr(>|z|)
treatmentpyridoxine 0.314
treatmentthiotepa 0.632
status 7e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine 1.1911 0.8396 0.8477 1.674
treatmentthiotepa 0.9264 1.0794 0.6773 1.267
status 1.3979 0.7153 1.1851 1.649
Concordance= 0.659 (se = 0.033 )
Likelihood ratio test= 16.16 on 3 df, p=0.001
Wald test = 17.6 on 3 df, p=5e-04
Score (logrank) test = 17.66 on 3 df, p=5e-04, Robust = 18.38 p=4e-04
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_conditional_a, data = df_1))

I will build a conditional model B.
recur_fit_conditional_b <-
coxph(Surv(stop - start, event) ~ treatment + status + cluster(id) + strata(enum)
,data = df_1)
summary(recur_fit_conditional_b)
Call:
coxph(formula = Surv(stop - start, event) ~ treatment + status +
strata(enum), data = df_1, cluster = id)
n= 294, number of events= 238
coef exp(coef) se(coef) robust se z
treatmentpyridoxine 0.23066 1.25943 0.16623 0.15649 1.474
treatmentthiotepa -0.00393 0.99608 0.16832 0.16334 -0.024
status 0.32238 1.38041 0.07826 0.07818 4.124
Pr(>|z|)
treatmentpyridoxine 0.140
treatmentthiotepa 0.981
status 3.73e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine 1.2594 0.7940 0.9268 1.712
treatmentthiotepa 0.9961 1.0039 0.7232 1.372
status 1.3804 0.7244 1.1843 1.609
Concordance= 0.641 (se = 0.029 )
Likelihood ratio test= 17.51 on 3 df, p=6e-04
Wald test = 20.29 on 3 df, p=1e-04
Score (logrank) test = 19.01 on 3 df, p=3e-04, Robust = 18.95 p=3e-04
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_conditional_b, data = df_1))

Lastly, I will build a marginal model.
Under this approach, each event is assumed to be a separate process.
recur_fit_marginal <-
coxph(Surv(stop, event) ~ treatment + status + cluster(id) + strata(enum)
,data = df_1)
summary(recur_fit_marginal)
Call:
coxph(formula = Surv(stop, event) ~ treatment + status + strata(enum),
data = df_1, cluster = id)
n= 294, number of events= 238
coef exp(coef) se(coef) robust se z
treatmentpyridoxine 0.27926 1.32215 0.16723 0.24330 1.148
treatmentthiotepa -0.19472 0.82306 0.16745 0.19892 -0.979
status 0.36416 1.43930 0.07790 0.07703 4.727
Pr(>|z|)
treatmentpyridoxine 0.251
treatmentthiotepa 0.328
status 2.27e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
treatmentpyridoxine 1.3222 0.7563 0.8207 2.130
treatmentthiotepa 0.8231 1.2150 0.5573 1.215
status 1.4393 0.6948 1.2376 1.674
Concordance= 0.643 (se = 0.03 )
Likelihood ratio test= 25.18 on 3 df, p=1e-05
Wald test = 25.71 on 3 df, p=1e-05
Score (logrank) test = 27.32 on 3 df, p=5e-06, Robust = 23.44 p=3e-05
(Note: the likelihood ratio and score tests assume independence of
observations within a cluster, the Wald and robust score tests do not).
ggsurvplot(survfit(recur_fit_marginal, data = df_1))

Hmmm, it seems like the coefficients of the different models are different. It made me curious how different they are.
To compare the different coefficients, I will first compile the results.
recurrent_model_coef_df <-
bind_rows(
# counting process
data.frame(summary(recur_fit)$coefficients) %>%
rownames_to_column("variable") %>%
clean_names() %>%
mutate(model = "counting process")
# conditional model A
,data.frame(summary(recur_fit_conditional_a)$coefficients) %>%
rownames_to_column("variable") %>%
clean_names() %>%
mutate(model = "conditional model a")
# conditional model B
,data.frame(summary(recur_fit_conditional_b)$coefficients) %>%
rownames_to_column("variable") %>%
clean_names() %>%
mutate(model = "conditional model b")
# marginal model
,data.frame(summary(recur_fit_marginal)$coefficients) %>%
rownames_to_column("variable") %>%
clean_names() %>%
mutate(model = "marginal model")
)
Then, I will compare the results by using ggplot functions.
recurrent_model_coef_df %>%
ggplot(aes(variable, coef, color = model)) +
geom_point() +
geom_errorbar(aes(ymin = coef - robust_se
,ymax = coef + robust_se)) +
xlab("Variables") +
ylab("Coefficients") +
labs(title = "Coefficients from Different Recurring Models")

Over here, I hasve used the robust errors in calculating the confidence interval as robust error terms would have account for the correlation.
This is because an individual could have multiple events and this measurement will account for the correlation.
Based on the graph, we could see that the confidence interval range for the coefficients overlap with one another.
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by Brett Jordan on Unsplash