Are the survival curves same? Yes or no?

Photo by Daniel Reche
In my previous post, I shared about how to build the survival curve.
Often, one of the questions raised while building the survival curve is whether the survival curves observed under the different groups are statistically different from one another.
Log-rank test is a chi-square test.
It compares the observed and expected counts to see whether the survival curves are statistically different.
Below is the null and alternative hypothesis of the log-rank test:
| Hypothesis | Remarks |
|---|---|
| Null | All survival curves are the same |
| Alternative | At least one of the survival curves is different from the rest |

Taken from DATAtab
There are different variations to the log-rank test (David G. Kleinbaum 2012).
They allow the users to apply different weights at the f-th failure time.

Taken from Survival Analysis - A Self Learning Text book
One of the arguments in the test is rho. Below is the difference when different rho is assumed in the test (Sestelo 2017):
The default for rho is 0, which is the log-rank test.
When rho = 1, it would be the peto peto modification of the gehan-wilcoxon test.
According to the author, the weighting method to be used in the log-rank test should be an priori decision, instead of trial and error to get the desirable results.
This is to avoid bias in the results (David G. Kleinbaum 2012).
Nevertheless, let’s start the demonstration!
In this demonstration, I will be using this bank dataset from Kaggle.
First, I will load the necessary packages into the environment.
pacman::p_load(tidyverse, survival, janitor, survminer)
With this, I will be using survival package to perform the log-rank test.
First I will import the data into the environment.
df <- read_csv("https://raw.githubusercontent.com/jasperlok/my-blog/master/_posts/2022-09-10-kaplan-meier/data/Churn_Modelling.csv")
Next, I will perform similar data wrangling.
Refer to my previous post for the details.
In this demonstration, I will compare the survival curve under different genders.
Recall that to visualize the survival curve, I will first create the survfit object and the created object into ggsurvplot function to visualize the survival curves.
surv_fit <- survfit(Surv(tenure, exited) ~ gender, data = df)
ggsurvplot(surv_fit)

From the graph, it looks like the survival curves are visually different under different genders.
To confirm this, I will perform a chi-square test on this to check whether the survival curves are indeed different.
As such, I will use survdiff function to perform the relevant task.
survdiff(Surv(tenure, exited) ~ gender, data = df)
Call:
survdiff(formula = Surv(tenure, exited) ~ gender, data = df)
N Observed Expected (O-E)^2/E (O-E)^2/V
gender=Female 4543 1139 920 52.2 101
gender=Male 5457 898 1117 43.0 101
Chisq= 101 on 1 degrees of freedom, p= <2e-16
As the p-value is greater than 0.05, we will reject the null hypothesis. There is statistical evidence that the two survival curves are different from one another.
Similarly, survdiff function also can be used when there are more than two survival curves.
For example, I would like to find out that the survival curves are indeed different when the number of products held by the customers differs.
survdiff(Surv(tenure, exited) ~ num_of_products, data = df)
Call:
survdiff(formula = Surv(tenure, exited) ~ num_of_products, data = df)
N Observed Expected (O-E)^2/E (O-E)^2/V
num_of_products=1 5084 1409 1027.2 142 304
num_of_products=2 4590 348 941.4 374 737
num_of_products=3 266 220 54.7 499 545
num_of_products=4 60 60 13.7 157 169
Chisq= 1246 on 3 degrees of freedom, p= <2e-16
As shown in the result above, we reject the null hypothesis and conclude that all the survival curves are not the same.
However, this test does not tell us whether the survival curves are similar for some of the groups.
For example, if we plot out the survival curve for customers that held a different number of products, it seems like the survival curves for customers who held 3 products and customers who held 4 products are rather similar.
surv_fit <- survfit(Surv(tenure, exited) ~ num_of_products, data = df)
ggsurvplot(surv_fit)

To verify the hypothesis above, I will use pairwise_survdiff function to generate the pairwise results.
pairwise_survdiff(Surv(tenure, exited) ~ num_of_products, data = df)
Pairwise comparisons using Log-Rank test
data: df and num_of_products
1 2 3
2 <2e-16 - -
3 <2e-16 <2e-16 -
4 <2e-16 <2e-16 0.43
P value adjustment method: BH
From the results above, we fail to reject the null hypothesis when comparing the survival curves for customers who held 3 and 4 products. There is no statistical evidence that the survival curves for customers with 3 and 4 products are different.
By default, the function will perform log-rank test (i.e. rho = 0).
To perform peto-peto test, we will just need to set the rho to 1 as shown below.
pairwise_survdiff(Surv(tenure, exited) ~ num_of_products, data = df, rho = 1)
Pairwise comparisons using Peto & Peto test
data: df and num_of_products
1 2 3
2 <2e-16 - -
3 <2e-16 <2e-16 -
4 <2e-16 <2e-16 0.52
P value adjustment method: BH
That’s all for the day!
Thanks for reading the post until the end.
Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.
Refer to this link for the blog disclaimer.
Till next time, happy learning!

Photo by George Milton