Causal Inference - Sensitivity Testing (Sensemakr)

Machine Learning Causal Inference
Jasper Lok https://jasperlok.netlify.app/
03-28-2024

Photo by Mojtaba Khosravi

Sensitivity Testing

Once we measure the causal effect, often we will perform sensitivity testing to check how sensitive the results might be if we miss out on any important confounders from the analysis.

In this post, I will be using sensemakr to perform sensitivity testing.

I will be also referring to this post on how to perform sensitivity testing.

sensemakr package

This method measures how much variation in treatment and outcome might be explained by adding an unknown variable.

This method would only work for regression models.

Partial R square

Another important concept to introduce in this post is Partial R square (a.k.a. coefficient of partial determination).

It shows that the percentage of the unexplained by selected variable can be explained all remaining variables.

\[R^{2}_{partial} = \frac{SSE(reduced) - SSE(full)}{SSE(reduced)}\]

I found this article explains the concept quite clearly.

Demonstration

In this post, I will be using both sensemakr and rsq packages to perform sensitivity testing.

pacman::p_load(tidyverse, tidymodels, broom, sensemakr, rsq)

The downside of this approach is it only works on models with \(R^2\). Nevertheless, it is easy to understand, so let’s look at how it works!

Import Data

First, I will import the data into the environment.

barrels_obs <- read_csv("data/barrels_observational.csv") %>%
  # This makes it so "No barrel" is the reference category
  mutate(barrel = fct_relevel(barrel, "No barrel"))

Partial R squared

First, I will be using rsq.partial function to check how many variations each variable explains.

model_barrel <-
  lm(water_bill ~ barrel_num + yard_size + home_garden + attitude_env + temperature
     ,data = barrels_obs)

glance(model_barrel)
# A tibble: 1 x 12
  r.squared adj.r.s~1 sigma stati~2 p.value    df logLik    AIC    BIC
      <dbl>     <dbl> <dbl>   <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1     0.794     0.793  14.8    952.       0     5 -5099. 10212. 10247.
# ... with 3 more variables: deviance <dbl>, df.residual <int>,
#   nobs <int>, and abbreviated variable names 1: adj.r.squared,
#   2: statistic
rsq.partial(model_barrel)
$adjustment
[1] FALSE

$variable
[1] "barrel_num"   "yard_size"    "home_garden"  "attitude_env"
[5] "temperature" 

$partial.rsq
[1] 0.599822151 0.546428566 0.001112098 0.141528247 0.424416764

Note that the \(R^2\) calculated for each component would not add up the \(R^2\) from the fitted model as explained under this StackExchange post.

From the results, we noted that the barrel has the highest partial \(R^2\). Interestingly, the \(R^2\) of yard_size is very close to \(R^2\) under barrel as well.

Sensitivity Testing

Next, I will use sensemakr function to perform the sensitivity testing.

To do so, I will pass the fitted model, treatment variable (which has to be in numeric form), a benchmark covariate, and kd (i.e., the hypothetical strength of the unknown covariate relative to the benchmark covariate).

For the choice of benchmark covariate, we will choose a covariate that has a substantial impact on the treatment and outcome already.

barrel_sensitivity_yard <-
  sensemakr(model = model_barrel
            ,treatment = "barrel_num"
            ,benchmark = "yard_size"
            ,kd = c(1, 2, 3))

barrel_sensitivity_yard
Sensitivity Analysis to Unobserved Confounding

Model Formula: water_bill ~ barrel_num + yard_size + home_garden + attitude_env + 
    temperature

Null hypothesis: q = 1 and reduce = TRUE 

Unadjusted Estimates of ' barrel_num ':
  Coef. estimate: -38.85029 
  Standard Error: 0.90298 
  t-value: -43.02474 

Sensitivity Statistics:
  Partial R2 of treatment with outcome: 0.59982 
  Robustness Value, q = 1 : 0.68602 
  Robustness Value, q = 1 alpha = 0.05 : 0.6706 

For more information, check summary.

Following are the explanations for the results under the Sensitivity Statistics section:

Alternatively, we could pass the object into the summary function as shown below. The generated result will contain a bit more description of what the result means.

summary(barrel_sensitivity_yard)
Sensitivity Analysis to Unobserved Confounding

Model Formula: water_bill ~ barrel_num + yard_size + home_garden +
attitude_env +
temperature

Null hypothesis: q = 1 and reduce = TRUE
-- This means we are considering biases that reduce the absolute
value of the current estimate.
-- The null hypothesis deemed problematic is H0:tau = 0

Unadjusted Estimates of 'barrel_num':
Coef. estimate: -38.8503
Standard Error: 0.903
t-value (H0:tau = 0): -43.0247

Sensitivity Statistics:
Partial R2 of treatment with outcome: 0.5998
Robustness Value, q = 1: 0.686
Robustness Value, q = 1, alpha = 0.05: 0.6706

Verbal interpretation of sensitivity statistics:

-- Partial R2 of the treatment with the outcome: an extreme
confounder (orthogonal to the covariates) that explains 100% of the
residual variance of the outcome, would need to explain at least
59.98% of the residual variance of the treatment to fully account for
the observed estimated effect.

-- Robustness Value, q = 1: unobserved confounders (orthogonal to the
covariates) that explain more than 68.6% of the residual variance of
both the treatment and the outcome are strong enough to bring the
point estimate to 0 (a bias of 100% of the original estimate).
Conversely, unobserved confounders that do not explain more than
68.6% of the residual variance of both the treatment and the outcome
are not strong enough to bring the point estimate to 0.

-- Robustness Value, q = 1, alpha = 0.05: unobserved confounders
(orthogonal to the covariates) that explain more than 67.06% of the
residual variance of both the treatment and the outcome are strong
enough to bring the estimate to a range where it is no longer
'statistically different' from 0 (a bias of 100% of the original
estimate), at the significance level of alpha = 0.05. Conversely,
unobserved confounders that do not explain more than 67.06% of the
residual variance of both the treatment and the outcome are not
strong enough to bring the estimate to a range where it is no longer
'statistically different' from 0, at the significance level of alpha
= 0.05.

Bounds on omitted variable bias:

--The table below shows the maximum strength of unobserved
confounders with association with the treatment and the outcome
bounded by a multiple of the observed explanatory power of the chosen
benchmark covariate(s).

Bound Label R2dz.x R2yz.dx Treatment Adjusted Estimate Adjusted Se
1x yard_size 0.0022 1 barrel_num -37.3735 0
2x yard_size 0.0043 1 barrel_num -36.7595 0
3x yard_size 0.0065 1 barrel_num -36.2869 0
Adjusted T Adjusted Lower CI Adjusted Upper CI
-Inf -37.3735 -37.3735
-Inf -36.7595 -36.7595
-Inf -36.2869 -36.2869

We could also ovb_minimal_reporting function to summarise the results into a nice table.

If we want to use a categorical variable as the benchmark, we will need to use the dummy variable as the benchmark as the sensemakr function can only use the covariates in the fitted model to run the sensitivity testing.

barrel_sensitivity_home_garden <-
  sensemakr(model = model_barrel
            ,treatment = "barrel_num"
            ,benchmark = "home_gardenNo home garden"
            ,kd = c(1, 2, 3))

barrel_sensitivity_home_garden
Sensitivity Analysis to Unobserved Confounding

Model Formula: water_bill ~ barrel_num + yard_size + home_garden + attitude_env + 
    temperature

Null hypothesis: q = 1 and reduce = TRUE 

Unadjusted Estimates of ' barrel_num ':
  Coef. estimate: -38.85029 
  Standard Error: 0.90298 
  t-value: -43.02474 

Sensitivity Statistics:
  Partial R2 of treatment with outcome: 0.59982 
  Robustness Value, q = 1 : 0.68602 
  Robustness Value, q = 1 alpha = 0.05 : 0.6706 

For more information, check summary.

Now let’s try to remove one significant variable

If we were to rerun the inverse probability weighting to estimate the causal effect of barrel after removing one of the important covariates (i.e., temperature in this example), we would see that the estimated effect is outside of the confidence interval in our previous post.

# fit the propensity model
model_logit <-
  glm(barrel ~ yard_size + home_garden + attitude_env
      ,data = barrels_obs
      ,family = binomial(link = "logit"))

# calculate the inverse probability weighting
barrels_ipw <- 
  augment_columns(model_logit
                  ,barrels_obs
                  ,type.predict = "response") %>% 
  rename(propensity = .fitted) %>% 
  mutate(ipw = (barrel_num / propensity) + ((1 - barrel_num) / (1 - propensity)))

# estimate the effect
model_ipw <-
  lm(water_bill ~ barrel
     ,data = barrels_ipw
     ,weights = ipw)

model_ipw

Call:
lm(formula = water_bill ~ barrel, data = barrels_ipw, weights = ipw)

Coefficients:
 (Intercept)  barrelBarrel  
      225.03        -30.51  

If we were to rerun the sensitivity testing, we would see that the bare minimum strength is estimated to be 0.369, whereas the partial \(R^2\) for temperature is estimated to be around 0.424.

Hence, the estimated effect of barrel of the revised model is out of the confidence interval estimated in the previous post.

barrel_sensitivity_yard_updated <-
  sensemakr(model = lm(water_bill ~ barrel_num + yard_size + home_garden + attitude_env, data = barrels_obs)
            ,treatment = "barrel_num"
            ,benchmark = "yard_size"
            ,kd = c(1, 2, 3))

barrel_sensitivity_yard_updated
Sensitivity Analysis to Unobserved Confounding

Model Formula: water_bill ~ barrel_num + yard_size + home_garden + attitude_env

Null hypothesis: q = 1 and reduce = TRUE 

Unadjusted Estimates of ' barrel_num ':
  Coef. estimate: -30.46267 
  Standard Error: 1.13196 
  t-value: -26.91146 

Sensitivity Statistics:
  Partial R2 of treatment with outcome: 0.36946 
  Robustness Value, q = 1 : 0.52665 
  Robustness Value, q = 1 alpha = 0.05 : 0.50119 

For more information, check summary.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Taken from giphy

References