As discussed in the previous post, RCTs might not be practical in some occasions.

Hence, this is where other techniques can be used to study the causal effect.

In this post, we will be looking at how does difference-in-differences work.

What is difference-in-differences?

(DS4PS) This method allows us to analyze the effect by taking into consideration:

how the group mean changes before and after a policy intervention and
compare this change with the mean over time of a similar group which did not undergo the treatment (i.e., control group)

The formula of difference-in-differences can be written as follows: \[Y = \beta_0 + \beta_1 * treatment + \beta_2 * post + \beta_3 * treatment * post + \epsilon\]

Where Y is the outcome variable

The key is to check whether the interaction term is statistically significant. If the term is significant, it would indicate that the before and after treatment effect is indeed not zero.

Why use this method?

This method is rather easy to understand.

Also, we need not to care about the effect of confounding variables while using this method.

Important considerations

To use such a method, we need to check the following assumptions:

Parallel trends

We can check this by using visual inspection.

This can also be checked by running a difference-in-difference analysis on the earlier period.

The interaction term should not be statistically significant, otherwise it would indicate that the parallel trend assumption is being violated.

Treatment timing

One issue to take note of is when the groups receive treatment as this could distort the estimate.

To resolve this issue, we could use Goodman-Bacon decomposition to study, which I won’t be exploring this method in this analysis.

Best practices for performing differences-in-differences analysis

Below are some best practices when using difference-in-difference approach (Fernandez 2024):

Be sure outcome trends did not influence the allocation of the treatment/intervention
Acquire more data points before and after to test parallel trend assumption
Use linear probability model to help with interpretability
Be sure to examine composition of population in treatment/intervention and control groups before and after intervention
Use robust standard errors to account for autocorrelation between pre/post in same individual
Perform sub-analysis to see if intervention had similar/different effect on components of the outcome

Demonstration

In this post, I will be using the following packages to perform the analysis.

pacman::p_load(tidyverse, haven, broom)

Import Data

I will be using the dataset shared by Professor Andrew Heiss on his teaching website. I have also referenced the materials and problem set he posted on the course website.

Refer to this link for the dataset.

eitc <- read_stata("data/eitc.dta") %>% 
  mutate(children_cat = case_when(
    children == 0 ~ "0",
    children == 1 ~ "1",
    children >= 2 ~ "2+"
  ))

This dataset comes from US Census’s Current Population Survey. The objective of the analysis is to find out whether the number of employment for women increased after the implementation of Earned Income Tax Credit in 1993.

Below are the definitions of the columns in the dataset:

state: The woman’s state of residence. The numbers are Census/CPS state numbers: http://unionstats.gsu.edu/State_Code.htm
year: The tax year
urate: The unemployment rate in the woman’s state of residence
children: The number of children the woman has
nonwhite: Binary variable indicating if the woman is not white (1 = Hispanic/Black)
finc: The woman’s family income in 1997 dollars
earn: The woman’s personal income in 1997 dollars
age: The woman’s age
ed: The number of years of education the woman has
unearn: The woman’s family income minus her personal income, in thousands of 1997 dollars

eitc <- 
  eitc %>% 
  mutate(ind_any_kids = if_else(children_cat == "0", "no", "yes")
         ,ind_after_1993 = if_else(year <= 1993, "no", "yes"))

Difference-in-differences analysis

Check assumptions

Similar

eitc %>% 
  ggplot(aes(children_cat, work)) +
  stat_summary(geom = "pointrange", fun.data = "mean_se", fun.args = list(mult = 1.96))

Parallel Assumptions

First, I will calculate the average employment by year and whether the women have any kids.

eitc_by_year_kids <-
  eitc %>% 
  group_by(year, ind_any_kids) %>% 
  summarise(count_total = n()
            ,work_total = sum(work)) %>% 
  ungroup() %>% 
  mutate(work_avg = work_total/count_total)

Visual Inspect

Then, I will visualize the result by using ggplot function.

ggplot(eitc_by_year_kids, aes(year, work_avg, color = ind_any_kids)) +
  geom_line() +
  theme_minimal() +
  geom_vline(xintercept = 1994) +
  annotate("text",x = 1992.9,y = 0.45,label = "<---- Before Treatment") +
  annotate("text",x = 1995,y = 0.45,label = "After Treatment ---->") +
  xlab("Year") +
  ylab("Average Employment") +
  labs(title = "Average Employment by Year")

It seems like the pre-treatment trends between the two groups appear to be the same.

This is important as this would allow us to isolate the effect of the tax credit in the later step.

Check by using earlier data

Another method to check the parallel trend pattern is to build a model based on earlier period data.

If the parallel trend assumption holds, then the interaction term would not be statistically significant.

With that, I will build a model for a period between 1991 and 1993.

eitc_fake_treatment <- eitc %>% 
  filter(year < 1994) %>% 
  mutate(after_1991 = year >= 1992
         ,ind_year_1991 = if_else(year <= 1991, "yes", "no"))

model_fit_fake_treat <-
  lm(work ~ ind_any_kids + ind_year_1991 + ind_any_kids * ind_year_1991
     ,data = eitc_fake_treatment)

tidy(model_fit_fake_treat)

# A tibble: 4 x 5
  term                             estimate std.error stati~1  p.value
  <chr>                               <dbl>     <dbl>   <dbl>    <dbl>
1 (Intercept)                        0.571     0.0110  52.1   0       
2 ind_any_kidsyes                   -0.133     0.0145  -9.19  5.05e-20
3 ind_year_1991yes                   0.0117    0.0185   0.631 5.28e- 1
4 ind_any_kidsyes:ind_year_1991yes   0.0101    0.0244   0.415 6.78e- 1
# ... with abbreviated variable name 1: statistic

As shown in the result above, the interaction term is not statistical significant, suggesting that the parallel term assumption is appropriate.

Nevertheless, let’s start by estimating the causal effect.

Manual Calculation on the Treatment Effect

Next, I will compute the average employment under each group.

eitc_manul_calc <-
  eitc %>% 
  group_by(ind_any_kids, ind_after_1993) %>% 
  summarise(work_avg = mean(work))

eitc_manul_calc

# A tibble: 4 x 3
# Groups:   ind_any_kids [2]
  ind_any_kids ind_after_1993 work_avg
  <chr>        <chr>             <dbl>
1 no           no                0.575
2 no           yes               0.573
3 yes          no                0.446
4 yes          yes               0.491

Then, I will estimate the causal effect.

eitc_manul_calc[1,3] - eitc_manul_calc[3,3] - (eitc_manul_calc[2,3] - eitc_manul_calc[4,3])

    work_avg
1 0.04687313

Estimated Effect by using Regression

Instead of calculating manually, we could also fit a regression line to compute the causal effect.

model_fit_simple <-
  lm(work ~ ind_after_1993 + ind_any_kids + ind_after_1993 * ind_any_kids
     ,data = eitc)

tidy(model_fit_simple)

# A tibble: 4 x 5
  term                              estimate std.er~1 stati~2  p.value
  <chr>                                <dbl>    <dbl>   <dbl>    <dbl>
1 (Intercept)                        0.575    0.00885  65.1   0       
2 ind_after_1993yes                 -0.00207  0.0129   -0.160 8.73e- 1
3 ind_any_kidsyes                   -0.129    0.0117  -11.1   1.84e-28
4 ind_after_1993yes:ind_any_kidsyes  0.0469   0.0172    2.73  6.31e- 3
# ... with abbreviated variable names 1: std.error, 2: statistic

Ta-da! Now we have computed the causal effect.

Note that the estimated causal effect is the same as the computed figure in the earlier step.

There is statistical evidence that the tax credits increases the average employment for women with kids.

Control the effects of other variables

We could take one step further to include other variables in the formula so that we could control their respective effects.

model_fit_complex <-
  lm(work ~ ind_after_1993 + ind_any_kids + ind_after_1993 * ind_any_kids + unearn + children + nonwhite + poly(age, 2) + poly(ed, 2)
     ,data = eitc)

tidy(model_fit_complex)

# A tibble: 11 x 5
   term                             estimate std.e~1 stati~2   p.value
   <chr>                               <dbl>   <dbl>   <dbl>     <dbl>
 1 (Intercept)                       0.699   1.00e-2  69.6   0        
 2 ind_after_1993yes                -0.00868 1.23e-2  -0.704 4.82e-  1
 3 ind_any_kidsyes                  -0.0210  1.47e-2  -1.43  1.52e-  1
 4 unearn                           -0.0178  5.76e-4 -30.8   5.26e-202
 5 children                         -0.0518  4.52e-3 -11.5   2.93e- 30
 6 nonwhite                         -0.0628  8.48e-3  -7.40  1.40e- 13
 7 poly(age, 2)1                     3.22    5.07e-1   6.36  2.13e- 10
 8 poly(age, 2)2                    -4.11    4.90e-1  -8.39  5.23e- 17
 9 poly(ed, 2)1                      4.58    4.85e-1   9.43  4.74e- 21
10 poly(ed, 2)2                      1.56    4.79e-1   3.26  1.11e-  3
11 ind_after_1993yes:ind_any_kidsy~  0.0581  1.64e-2   3.55  3.81e-  4
# ... with abbreviated variable names 1: std.error, 2: statistic

After controlling for other effects, the treatment effect is about 0.058 now. The causal effect of tax credits remains statistically significant.

Varying Treatment Effect

As shown in the data, the subject could have more than 1 kid.

Therefore, we could split the “having kids indicator” into two so that we could estimate the causal effect of having one kid and more than one kid separately.

eitc <-
  eitc %>% 
  mutate(ind_one_kid = if_else(children_cat == "1", "yes", "no")
         ,ind_two_plus_kids = if_else(children_cat == "2+", "yes", "no"))


model_fit_two_simple <-
  lm(work ~ ind_after_1993 + ind_one_kid + ind_two_plus_kids + ind_after_1993 * ind_one_kid + ind_after_1993 * ind_two_plus_kids
     ,data = eitc)

tidy(model_fit_two_simple)

# A tibble: 6 x 5
  term                               estimate std.e~1 stati~2  p.value
  <chr>                                 <dbl>   <dbl>   <dbl>    <dbl>
1 (Intercept)                         0.575   0.00881  65.3   0       
2 ind_after_1993yes                  -0.00207 0.0129   -0.161 8.72e- 1
3 ind_one_kidyes                     -0.0519  0.0150   -3.45  5.56e- 4
4 ind_two_plus_kidsyes               -0.179   0.0131  -13.6   4.08e-42
5 ind_after_1993yes:ind_one_kidyes    0.0326  0.0221    1.48  1.40e- 1
6 ind_after_1993yes:ind_two_plus_ki~  0.0553  0.0193    2.86  4.19e- 3
# ... with abbreviated variable names 1: std.error, 2: statistic

It seems like the treatment effect is greater for women with two or more kids.

Also, the causal effect is only significant for the women with more than 1 kid.

Conclusion

That’s all for the day!

Thanks for reading the post until the end.

Feel free to contact me through email or LinkedIn if you have any suggestions on future topics to share.

Refer to this link for the blog disclaimer.

Till next time, happy learning!

Photo by Karolina Grabowska

DS4PS. “Difference in Difference.” https://ds4ps.org/PROG-EVAL-III/DiffInDiff.html.

Fernandez, Jose. 2024. https://bookdown.org/cuborican/RE_STAT/difference-in-differences.html.

Causal Inference - Difference-in-differences