Synthetic Control with Difference in Differences?

A deep dive into a simple way to do synthetic control with a Difference in Differences set up.
Author

Jeff Milliman

Published

December 21, 2025

Introduction

In this tutorial I am going to run through how to do synthetic control with the new difference in differences method introduced by Economists Soo Jeong Lee and Jeffrey Wooldridge, of Introductory Econometrics fame, in R. For a full introduction to their method, I would check out their newly released working paper - see (Lee and Wooldridge 2025). In addition, the DID Digest substack has a great summary of the paper and the benefits of this method over other options. The authors also apply their method to the more typical event study setting (i..e. Staggered Diff in Diff), with multiple treated units treated at different time periods, which I may return to in a later tutorial. Any errors made during this tutorial are entirely my own.

What is Synthetic Control?

For those unfamiliar with synthetic control, synthetic control is a quantitative method for case comparison used in situations in which only one unit or a small number of units experiences an intervention/treatment. In this situation, more traditional methods, such as difference in differences, tend to work poorly with very few or only one treated unit.

In short, synthetic control uses control/untreated units to constructs weights that attempt to closely approximate the treated unit prior to the treatment/intervention. These weights are used to predict the counterfactual outcome for the treated unit in the post period to approximate the outcome in the absence of the treatment/intervention. Effects are calculated as the difference between the observed outcome and the synthetic control (predicted counterfactual) in the post treatment period. For a a brief introduction to synthetic control methods, I recommend the chapter on synthetic control in Scott Cunningham’s Causal Inference: Mixtape, the chapter in Causal Inference for the Brave and True, and the 2021 Abadie overview paper (Abadie 2021).

In contrast to using synthetic control, the authors provide an extremely simple way to do policy analysis/causal inference that relies on the more ubiquitous difference in differences method - i.e. (post treated outcome - pre treated outcome) - (post control outcome - pre control outcome). For resources on difference in differences methods, I recommend this general overview (de Chaisemartin and D’Haultfœuille 2023) and additional resources on recent developments for staggered difference in differences (Roth et al. 2023; Wing et al. 2024)

Preliminary Requirements

For the remainder of this tutorial I will assume that you have:

1) At least an introductory knowledge of synthetic control.

2) That you have at least some familiarity with difference in differences (DID) for causal inference/policy analysis.

Why I am excited about this method

Anyone who has ever run a synthetic control analysis knows that synthetic control is like the wild west. While synthetic control was extremely novel when it was introduced in 2003 (Abadie and Gardeazabal 2003), the method has now expanded to dozens of packages and estimators, many of them utilizing machine learning methods, such as LASSO or Ridge Regression to estimate the synthetic control weights, and conformal inference or parametric bootstrapping for construction of prediction intervals and inference (Cattaneo et al. 2025). Many of these methods are outside the knowledge base of most policy analysts and social scientists, whose training tends to be in applied statistics or econometrics.

There is also no guarantee that any synthetic control method will generate a good fit to the treated unit in the pre treatment period, post treatment period inference can be challenging, and there appears to be little in the way of clear guidelines as to how a synthetic control analysis should be done with many of these estimators.

Soo Jeong Lee and Jeffrey Wooldridge have come to the rescue with an incredibly simple method that appears to perform remarkably well even in extremely small samples - i.e. 1 treated unit and 4 control units - and compares favorably to synthetic control in their applied examples. Of course, all the usual assumptions required to do difference in differences will apply, but their paper represents a promising step in a more simple direction.

Steps in the process

  1. Transform the outcome variable. First, the authors log the outcome variable to make the outcome variable more amenable to the demeaning or de-trending process. Next, the authors recommend either unit specific de-trending or unit specific demeaning to make the parallel trends assumption between the treated unit and the control units more plausible.
  • Unit Specific Demeaning. To demean your outcome variable, calculate the average of your outcome variable for each unit in the pre-treatment period and then subtract that average from the outcome variable for each unit for your entire time series (both the pre and post periods). This step is necessary to construct the plots. The authors do this by regressing the outcome variable in the pre treatment period on an intercept and a unit level dummy variable (in R - lm(outcome ~ 1 + unit, data = data)). They then subtract this from the entire time series, but you can also do the demeaning process manually. For inference, the authors only appear to run a regression that includes the post period demeaned or de-trended outcome (more on this below). However, the entire demeaned or detrended time series is useful to produce residual plots, which can give us a sense of how good a fit the demeaning or de-trending process produces between the treated and control units.

  • Unit Specific De-trending. To de-trend your outcome variable, first split your data set into unit level data sets. Then you fit a regression to estimate the time trend in the the pre-treatment period for each individual unit, regressing the outcome in the pre treatment period on an intercept and your time variable. In R this would take the form of lm(outcome ~ 1 + time, data = data). You then use the regression to predict the time trend for the entire time series (pre and post treatment periods) for each unit. The predictions from your models are then subtracted from the outcome variable for each unit for the entire time series to complete the de-trending process.

  1. Check the Pre and Post Period Fit between the Treated Unit and your Control Units. To check how closely your controls units “parallel” the treated unit in the pre treatment period, you can compare the average transformed outcome of the control units to the transformed outcome for the treated unit in the pre treatment period or for the entire time series. If the average outcome for the control units closely parallels the treated unit- i.e. increases or decreases by approximately the same amount from period to period as the treated unit in the pre treatment period, you can probably consider the pre treatment period fit to be good enough to continue on to estimating the model.

  2. Calculate the Post Unit Level Averages of the Outcome Variable. For most event studies you usually conduct your analysis on your entire time series. However, the authors collapse the entire time series into a cross-sectional dataset by simply averaging the post treatment outcomes for each unit. To conduct the analysis for specific post treatment periods - i.e. a single year - the authors keep the post treatment period outcome for that specific year as a separate outcome variable.

  3. Estimate the average (ATT) and Year Specific Effects. Finally, to estimate the average effect for the treated units (ATT), you simply regress the post period tranformed outcome variable on an intercept and your treatment variable. In R this would look like lm(post ~ 1 + treated, data = data). For year specific effects, you would swap out the post average outcome with a year average outcome and run the same regression. This is similiar to conducting a simple ANOVA post score analysis.

Applied Example with Analysis of the California Tobacco Control Program

To test their method for synthetic control the authors reanalyze the impact of the 1988 California tobacco control program (Proposition 99) on cigarette sales per capita (Abadie, Diamond, and Hainmueller 2010). Following the authors, I will not include covariates in the analysis, although the authors indicate that time invariant controls or averages of pre-treatment time varing controls could be added in the outcome regressions. The data set comes pre-loaded with the tidysynth package (Dunford 2025) and the following list describe the key variables and research design:

Unit of Analysis: State Level (U.S. States).

Treated Unit: California

Control Units: 38 states (excludes 11 other states that also implemented a similar tobacco control program)

Treatment Year: 1988 -1989 first year post treatment

Time Series: 1970: 2000 (19 years pre-treatment, 12 years post treatment)

Outcome Variable: the log of cigarette sales per capita (cigsale)

Loading in and Cleaning the Data

Code
#Uses the smoking synthetic control dataset from tidysnth

#load in packages
library(tidyverse)
#tidysynth for the smoking dataset
library(tidysynth)
#for extracting coefficents/marginal effects
library(marginaleffects)
#for tables
library(gt)

#load in the smoking dataset from tidysynth
data(smoking)

#arrange by state and year
smoking <- smoking |> 
           arrange(state, year) 

#create our treatment indicator
#treated = 1 when state is california
#treated post = 1 when state is california and post treatment - >= 1989
#Create the outcome variable as the log of cigarette sales
smoking <- smoking |> 
  #create the treated unit
  mutate(treated = case_when(state == "California" ~ 1,
                             .default = 0),
         #set the post and treated variable
         treated_post = case_when(state == "California" 
                                  & year >= 1989 ~1,
                                  .default = 0),
         #Log the outcome variables
         log_cigsale = log(cigsale))

head(smoking)
# A tibble: 6 × 10
  state    year cigsale lnincome  beer age15to24 retprice treated treated_post
  <chr>   <dbl>   <dbl>    <dbl> <dbl>     <dbl>    <dbl>   <dbl>        <dbl>
1 Alabama  1970    89.8    NA       NA     0.179     39.6       0            0
2 Alabama  1971    95.4    NA       NA     0.180     42.7       0            0
3 Alabama  1972   101.      9.50    NA     0.181     42.3       0            0
4 Alabama  1973   103.      9.55    NA     0.182     42.1       0            0
5 Alabama  1974   108.      9.54    NA     0.183     43.1       0            0
6 Alabama  1975   112.      9.54    NA     0.184     46.6       0            0
# ℹ 1 more variable: log_cigsale <dbl>

Unit Specific Demeaning

Following the steps in the paper, I will conduct the demeaning process first then move on to the de-trending process, which produces a better pre-treatment fit in their applied examples.

Code
##Unit Specific Demeaning
#filter dataset by the pre period
pre_period <- smoking |> filter(year < 1989)

#fit regression for the preperiod with the outcome predicted by 
#intercept and state dummy
pre_reg <- lm(log_cigsale ~1 + state, data = pre_period)

#grab fitted values(predictions) and residuals
pre_period$pred_pre <- fitted(pre_reg)
pre_period$resid_pre <- residuals(pre_reg)

#calculate the predictions by group for the pre treatment period
pre_pred_avg <- pre_period |>
  group_by(state) |>
  summarise(pre_pred_avg = mean(pred_pre)) |>
  ungroup()

#join preperiod averages to the smoking dataset 
#and subtract from the log of cigsales to demean
smoking <- left_join(smoking, pre_pred_avg, by  = "state") |>
  mutate(cig_sale_demeaned = (log_cigsale - pre_pred_avg))

#grab the demeaned outcomes for the regression by 
#averaging by state and period
demeaned_outcomes <- smoking |> select(state, year, 
                                       cig_sale_demeaned) |>
  #group by state
  group_by(state) |>
  #create our aggregate means in the pre and post periods
  summarise(#post period mean
            post_demeaned = mean(cig_sale_demeaned[year >= 1989]),
            #in year 1989
            post_demeaned_1989 = mean(cig_sale_demeaned[year == 1989]),
            #in year 1995
            post_demeaned_1995 = mean(cig_sale_demeaned[year == 1995]),
            #in year 2000
            post_demeaned_2000 = mean(cig_sale_demeaned[year == 2000])) |>
  #ungroup our columns
  ungroup() |>
  #create our treatment indicator for California
  mutate(treated = case_when(state == "California" ~1,
                             .default = 0))  |>
  #relocate the treated column after state
  relocate(treated, .after = state)

#look at our outcomes
  head(demeaned_outcomes)
# A tibble: 6 × 6
  state       treated post_demeaned post_demeaned_1989 post_demeaned_1995
  <chr>         <dbl>         <dbl>              <dbl>              <dbl>
1 Alabama           0       -0.0660           -0.0587             -0.0875
2 Arkansas          0       -0.0642           -0.00700            -0.0528
3 California        1       -0.668            -0.339              -0.718 
4 Colorado          0       -0.386            -0.336              -0.375 
5 Connecticut       0       -0.343            -0.119              -0.357 
6 Delaware          0       -0.158            -0.131              -0.165 
# ℹ 1 more variable: post_demeaned_2000 <dbl>

Calculating the Effects for the Demeaned Outcomes

To get the the ATT, we can run a simple regression, regressing the average post period outcome on an intercept and the treatment dummy variable.

#Demeaned ATT
demeaned_att <- lm(post_demeaned ~ 1 + treated, 
                   data = demeaned_outcomes)

summary(demeaned_att)

Call:
lm(formula = post_demeaned ~ 1 + treated, data = demeaned_outcomes)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.23014 -0.08548  0.00000  0.09464  0.22825 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.24616    0.01934 -12.726 4.39e-15 ***
treated     -0.42217    0.12080  -3.495  0.00125 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1192 on 37 degrees of freedom
Multiple R-squared:  0.2482,    Adjusted R-squared:  0.2279 
F-statistic: 12.21 on 1 and 37 DF,  p-value: 0.001249

Relative to the control states, the cigarette sales per capita (in packs) in California decreased by an average of approximately 42% through the entire post treatment period. To get the effects for a single time period we can run the same regression as above, but we swap out the post average outcome with the outcome for a single post treatment period - i.e. the year 2000.

#Effects for year 2000
effects_2000 <- lm(post_demeaned_2000 ~ 1 + treated, 
                   data = demeaned_outcomes)

summary(effects_2000)

Call:
lm(formula = post_demeaned_2000 ~ 1 + treated, data = demeaned_outcomes)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.2712 -0.1535  0.0000  0.1384  0.2903 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.35485    0.02632  -13.48 7.48e-16 ***
treated     -0.66732    0.16435   -4.06 0.000244 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1622 on 37 degrees of freedom
Multiple R-squared:  0.3082,    Adjusted R-squared:  0.2895 
F-statistic: 16.49 on 1 and 37 DF,  p-value: 0.0002441

Relative to the control units, California had approximately 67% fewer cigarette sales per capita (in packs) in the year 2000.

To recreate the results from Table 3 of the paper, which includes the overall ATT and effects for years 1989, 1995, and 2000, I created a couple of functions to fit the models separately and aggregate the effects into a single table with the marginaleffects package (Arel-Bundock, Greifer, and Heiss 2024). I will use these functions to reproduce the tables from the paper.

To make the estimates easier to interpret as percent changes I multiplied the estimate, standard error, and the confidence interval by 100. To keep the results consistent with those from the paper, I will forgo the usual approach to interpreting coefficients in linear models with a logged dependent variable that transforms the coefficients by exponentiation to construct the percent change and will instead use the simpler method of multiplying the treatment coefficient by 100.

Code
#calculate the effects for the demeaned outcomes
#set the list of demeaned outcome variables
outcomes_demeaned <- c("post_demeaned",
                       "post_demeaned_1989",
                       "post_demeaned_1995",
                       "post_demeaned_2000")

#create our regression formula to apply to the list of demeaned outcomes
lm_function <- function(outcome, iv, df) {
  mod_formula <- paste(outcome, "~ 1 +", iv)
  reg <- lm(mod_formula, data = df) }


#create a list of models
demeaned_mods_list <- lapply(outcomes_demeaned, lm_function, 
                             iv = "treated", 
                             df = demeaned_outcomes)

#set the names of the models in the list
names(demeaned_mods_list) <- c("Average Effect",
                               "Effect in 1989",
                               "Effect in 1995",
                               "Effect in 2000")


#Create function to extract marginal effects from a list of models, 
#given your coefficient of interest
marginal_effects_table <- function(model_list, key_term){
  
  #create function to grab our marginal effects from each model
  extract_effects <- function(model, term) {
    marg_eff <- marginaleffects::avg_comparisons(model = model, 
                                                 variables = term) |>  
      as.data.frame()
    
    return(marg_eff)
  }
  #apply the function (extract_effects) to model list, extract 
  #the key coefficient (term)
  effects_table <- purrr::map2_dfr(model_list, key_term, 
                                   extract_effects)
  
  #extract the model names from your list of models 
  #and rename the column
  names_variables = names(model_list) |> 
    #convert into a dataframe
    as.data.frame() |> 
    #rename the names column to " Effects Period"
    rename("Effects Period" = 1)
  
  #cbind names and table of effects
  effects_table <- cbind(names_variables, effects_table) |> 
    #drop the predictions - keep only columns 1:10
    select(1:10) |> 
    #round all numeric variables to 3 decimal places
    mutate(across(where(is.numeric), round, digits = 3),
           #multiply estimate, std.error, and confidence intervals 
           #by 100 to get our percent change
           estimate = (estimate*100),
           std.error = (std.error*100),
           conf.low = (conf.low*100),
           conf.high = (conf.high*100)) |> 
    #rename estimate column to "estimate (percent change)"
    rename("estimate (percent change)" = estimate)
  
  
  return(effects_table)
  
}

#Apply the functions to recreate the table
table_3_demeaned <- marginal_effects_table(
  model_list = demeaned_mods_list, key_term = "treated") 

#Make the table pretty and set title with gt
table_3_demeaned <- table_3_demeaned |> 
  gt()  |> 
  #Add in the title
  tab_header(
    title = "Table 3 (Estimated ATTs): Demeaned") |> 
  gt::tab_options(row.striping.include_table_body = FALSE,
                  quarto.disable_processing = TRUE)

table_3_demeaned
Table 3 (Estimated ATTs): Demeaned
Effects Period term contrast estimate (percent change) std.error statistic p.value s.value conf.low conf.high
Average Effect treated 1 - 0 -42.2 12.1 -3.495 0.000 11.042 -65.9 -18.5
Effect in 1989 treated 1 - 0 -16.8 9.6 -1.756 0.079 3.660 -35.6 2.0
Effect in 1995 treated 1 - 0 -48.4 13.7 -3.518 0.000 11.166 -75.3 -21.4
Effect in 2000 treated 1 - 0 -66.7 16.4 -4.060 0.000 14.316 -98.9 -34.5

A Quick Note on Standard Errors and Uncertainty Calculations

The authors recommend the use of heteroskedasticity-consistent standard errors - “HC3” - if there are a large number of control and/or treated units. To add in the heteroskedasticity-consistent standard errors, you would simply add the vcov = “HC3” argument to the avg_comparisons function - i.e. avg_comparisons(model = model, vcov = "HC3") - in the code above (see the link for more details). Alternatively, you could correct the standard errors on a single regression fit with the lm() function with the sandwich package (Zeileis, Koll, and Graham 2020).

The authors also check the robustness of some of their ATT estimates using randomization inference with 1,000 replications, which can be implemented using the ritest package in R (McDermott 2026). According to the authors, randomization inference is beneficial because it does not rely on the “normality assumption” (Lee and Wooldridge 2025, 8) and is similar to the permutation tests that provide inference in the standard synthetic control design (Abadie 2021). For an overview of the pros and cons of randomization inference in observational studies I recommend the section “Which uncertainty matters? Randomization Inference (RI)” from Claire Palandri’s guide, Causal Inference in Observational Studies, which is posted on her github.

Calculating the Effects for the De-trended Outcomes

To calculate the average and period specific effects for the de-trended outcomes I will simply reuse the functions that I used previously.

Code
#set our vector of dependent variables for the regression
outcomes_detrended <- c("post_detrend",
                        "post_detrend_1989",
                        "post_detrend_1995",
                        "post_detrend_2000")


#make the detrended models, applying the lm #function to the all controls detrended dataset. 
detrended_models <- lapply(outcomes_detrended, lm_function, 
                           iv = "treated", 
                           df = all_cons_detrended) 

#set the names for the models
names(detrended_models) <- c("Average Effect","Effect in 1989",
                             "Effect in 1995","Effect in 2000")

#use the function to recreate the estimates from the paper
table_3_detrended <- marginal_effects_table(model_list = detrended_models,
                                            key_term = "treated") 

#create table outbut with gt
table_3_detrended <- table_3_detrended |> 
  gt()  |> 
  #Add in the title
  tab_header(
    title = "Table 3 (Estimated ATTs): De-Trended") |> 
  gt::tab_options(row.striping.include_table_body = FALSE,
                  quarto.disable_processing = TRUE)


table_3_detrended 
Table 3 (Estimated ATTs): De-Trended
Effects Period term contrast estimate (percent change) std.error statistic p.value s.value conf.low conf.high
Average Effect treated 1 - 0 -22.7 9.4 -2.413 0.016 5.982 -41.1 -4.3
Effect in 1989 treated 1 - 0 -4.2 5.9 -0.713 0.476 1.071 -15.8 7.4
Effect in 1995 treated 1 - 0 -28.2 11.2 -2.515 0.012 6.393 -50.2 -6.2
Effect in 2000 treated 1 - 0 -40.3 15.2 -2.643 0.008 6.926 -70.2 -10.4

With a better pre-treatment fit, the average effect (ATT) closely approximates the ATT from the synthetic control estimate and the synthdid estimate from the Lee and Wooldridge working paper. The average effect (ATT) is approximately a -22.7% average decrease in cigarrette sales per capita for California relative to the control states through the entire post treatment period. The period specific effects (ATTs) increase significantly from a -4.2% change in 1989 to a -40.3% change in year 2000.

Figure 3: Gap Plot (De-trended Outcome)

Code
#Figure 3 - gap lot
figure3 <- detrended_plot_df |> 
  #pivot detrended outcome wider
  pivot_wider(names_from = group, values_from = cig_sale_detrended) |> 
  #create difference column between California and the controls
  mutate(diff = (California - Control)) |> 
  #plot the difference
  ggplot() + geom_line(aes(x = year, y = diff, group = 1), 
                       color = "blue") +
  labs(title = "Figure 3: Gap Between California and Average of All Controls, 
       State-Specific Linear Trend", x = "year", y = "gap")  + 
  geom_vline(xintercept = 1989, linetype = 2, color = "black") +
  theme_minimal()

figure3

Figure 3 above reproduces the “Gap plot”, which plots the difference in log cigarette sales per capita between California and the control states through the entire time series. As we can see, the gap between California and the control states is roughly zero prior to 1985, with the gap starting to increase in the mid 1980s as California declines at a faster rate relative to the control group. The gap continues to grow negatively in the post treatment period (1989 onward).

DID Synthetic Control with only 4 control variables

Up until know I have run the analysis as analogous to a typical synthetic control design where we use all the available control units (38 states) to compare with our treated unit (California). However, as a robustness check, the authors rerun their analysis with two groups of only four control states (two groups of 4 controls each from the South and the Midwest). This is, to put it mildly, a little bit crazy.

As the authors point out, 4 units is an extremely low number to put into a control group and states from the South and the Midwest would appear to be “bad controls” as they are dissimilar from California culturally, economically, and demographically. I’ll reproduce their results with the southern and midwest control groups, but I will focus on the analysis with the de-trended outcome (log cig sale), as this method generates a better pre-treatment fit and results that are closer to those from other estimators than the demeaning method.

DID Synthetic Control with 4 Southern States as the Control Group

Code to clean the data and create the California plus south states control group

Code
#try southern states as controls + California
#Set the controls plus California
south_controls <- c("Alabama", 
                    "Arkansas",
                    "California", 
                    "Louisiana", 
                    "Mississippi")


#make our columns for each period with the detrended dataset
south_cons_detrended <-  smoking |> select(state, year, 
                                           cig_sale_detrended ) |> 
  #filter for our southern controls plus california
  filter(state %in% south_controls) |> 
  #groupby state
  group_by(state) |> 
  #create our aggregate means in the pre period
  summarise(#post period mean
            post_detrend = mean(cig_sale_detrended[year >= 1989]),
            #in year 1989
            post_detrend_1989 = mean(cig_sale_detrended[year == 1989]),
            #in year 1995
            post_detrend_1995 = mean(cig_sale_detrended[year == 1995]),
            #in year 2000
            post_detrend_2000 = mean(cig_sale_detrended[year == 2000])) |> 
  #ungroup our columns
  ungroup() |> 
  #create our treatment indicator for California
  mutate(treated = case_when(state == "California" ~ 1,
                             .default = 0))  |>
  #relocate the treated column after state
  relocate(treated, .after = state)

Calculating the Effects for the De-trended Outcomes with 4 Southern Control States

Code
#run our southern models
south_models <- lapply(outcomes_detrended, lm_function, 
                       iv = "treated", 
                       df = south_cons_detrended)

#set our names
names(south_models) <- c("Average Effect",
                         "Effect in 1989",
                         "Effect in 1995",
                         "Effect in 2000")


#let see how we do with table_4
table_4_detrended <- marginal_effects_table(model_list = south_models,
                                            key_term = "treated")

#convert to table with gt
 table_4_detrended <- table_4_detrended |> 
  gt()  |> 
  #Add in the title
  tab_header(
    title = "Table 4 (Estimated ATTs): 
             De-Trended - South Controls (AL, AR, LA, MS)") |> 
  gt::tab_options(row.striping.include_table_body = FALSE,
                  quarto.disable_processing = TRUE)


 table_4_detrended
Table 4 (Estimated ATTs): De-Trended - South Controls (AL, AR, LA, MS)
Effects Period term contrast estimate (percent change) std.error statistic p.value s.value conf.low conf.high
Average Effect treated 1 - 0 -21.5 3.9 -5.493 0.000 24.594 -29.2 -13.8
Effect in 1989 treated 1 - 0 -2.7 5.2 -0.526 0.599 0.740 -13.0 7.5
Effect in 1995 treated 1 - 0 -25.9 5.5 -4.723 0.000 18.717 -36.7 -15.2
Effect in 2000 treated 1 - 0 -37.7 11.5 -3.266 0.001 9.841 -60.3 -15.1

Looking at Table 4, we get a very similar overall ATT and similar period specific effects as we did when we included all the other 38 states as controls. This is quite remarkable and suggests that this method may work in situations where there is not a large control group or with a control group that might be less than ideal. The authors repeat their analysis with a control group of only 4 midwest states and get similar results. Which is, once again, quite remarkable with such a small control group that would not appear to be an ideal control for California.

Calculating the Effects for the De-trended Outcome with 4 Midwest Control States

Cleaning and analysis code for the midwest control states

Code
#lets check the midwest
#Set our midwest controls + California
mw_controls <- c("California", "Illinois", "Iowa", "Minnesota", "Ohio")

#create out midwest detrended dataset
mw_controls_detrended <-  smoking |> select(state, year, 
                                            cig_sale_detrended ) |> 
  #filter for our midwest controls plus california
  filter(state %in% mw_controls) |> 
  #groupby state
  group_by(state) |> 
  #create our aggregate means in the pre period
  summarise(#post period mean
            post_detrend = mean(cig_sale_detrended[year >= 1989]),
            #in year 1989
            post_detrend_1989 = mean(cig_sale_detrended[year == 1989]),
            #in year 1995
            post_detrend_1995 = mean(cig_sale_detrended[year == 1995]),
            #in year 2000
            post_detrend_2000 = mean(cig_sale_detrended[year == 2000])) |> 
  #ungroup our columns
  ungroup() |> 
  #create our treatment indicator for California
  mutate(treated = case_when(state == "California" ~ 1,
                             .default = 0))  |> 
  #relocate the terated column after state
  dplyr::relocate(treated, .after = state)

#run our midwest models
mw_models <- lapply(outcomes_detrended, lm_function, 
                    iv = "treated", 
                    df = mw_controls_detrended)

#set our names
names(mw_models) <- c("Average Effect","Effect in 1989",
                      "Effect in 1995","Effect in 2000")


#let see how we do with table_5
table_5_detrended <- marginal_effects_table(model_list = mw_models,
                                            key_term = "treated")


table_5_detrended <- table_5_detrended |> 
  gt()  |> 
  #Add in the title
  tab_header(
    title = "Table 5 (Estimated ATTs): 
             De-Trended - Midwest Controls (IL, IA, MN, OH)") |> 
  gt::tab_options(row.striping.include_table_body = FALSE,
                  quarto.disable_processing = TRUE)


table_5_detrended
Table 5 (Estimated ATTs): De-Trended - Midwest Controls (IL, IA, MN, OH)
Effects Period term contrast estimate (percent change) std.error statistic p.value s.value conf.low conf.high
Average Effect treated 1 - 0 -19.8 7.9 -2.506 0.012 6.356 -35.2 -4.3
Effect in 1989 treated 1 - 0 -3.9 4.5 -0.878 0.380 1.397 -12.8 4.9
Effect in 1995 treated 1 - 0 -23.9 8.8 -2.733 0.006 7.317 -41.1 -6.8
Effect in 2000 treated 1 - 0 -36.3 13.6 -2.677 0.007 7.072 -62.9 -9.7

Potential Drawbacks to this Approach

While the method is intuitive, simple, and appears to deliver good results with small samples sizes and less than ideal control groups, I do see three drawbacks to this approach. That being said, this method might be a great tool to have, especially if you are pressed for time and need to asses the impact of an intervention on a small number of treated units with a small control group.

  1. Parallel trends may not hold: Like other Diff in Diff methods there is no guarantee that the parallel trends assumption will hold and transforming the outcome variable through demeaning or de-trending may or may not make this assumption more plausible. There more approaches to de-trending, such as quadratic or polynomial de-trending, but it is unclear to me how often transforming the outcome variable will work in most policy evaluation research designs.

  2. Low statistical power: Many Diff in Diff studies are under powered to detect plausible effect sizes with the typical sample sizes used in these studies (Chiu et al. 2025; Egerod and Hollenbach 2024). Unfortunately, this problem is likely worse in situations where a single unit is treated. Regarding this analysis, the average difference between the treatment and the control groups (the ATT) was around -20% to around -66% for specific period effects depending on how the outcome was transformed. Effects this large are not likely to be realistic in many applied settings and most interventions will probably not produce such large effect sizes.

    The power analysis in the working paper by Egerod and Hollenbach (2024) suggests that with a logged outcome and a small number of treated units in a typical staggered event study setting (multiple units treated at multiple times) you need an ATT around 15% to reach the benchmark 80% power. While I would need to simulate data to confirm this, using 15% as a baseline and taking into account the effect sizes that were detected with the southern and midwest control groups of 4 units, I think it is reasonable to suggest that an ATT of around 15% to 20% is likely near the minimum effect size that can be reliably detected in this type of design with a single unit treated at one point in time and a logged outcome.

    You could increase your statistical power by adding in controls to the outcome regression or more treated and control units, but my sense is that having an intervention with a large effect size is probably more important for statistical power than increasing your sample size. Unfortunately, unlike experiments, there is no clear way to increase effect sizes for retrospective observational studies. You are stuck with what you have.

    You might also be able to improve your statistical power with a single or a few treated units by focusing on cumulative effects (Andrew Wheeler makes a persuasive argument for this in synthetic control designs), rather than on the average (ATT) or period specific effects, but there may not be a straightforward way to construct confidence intervals or standard errors for cumulative effects. The options for doing so, such as the parametric or nonparametric bootstrap, can be computationally intensive and may return confidence intervals that are too narrow with one treated unit.

  3. May need a large number of pre and post treatment periods: This method likely works best with a large number of pre treatment and post treatment periods that can make the averaging of the time series into a cross section more feasible. However, this method may not work as well in the worst case scenario where you have a small number of control units and your time series is short - i.e. only a couple of pre and post treatment periods.

References

Abadie, Alberto. 2021. “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.” Journal of Economic Literature 59(2): 391–425.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.” Journal of the American Statistical Association 105(490): 493–505. doi:10.1198/jasa.2009.ap08746.
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” American economic review 93(1): 113–32.
Arel-Bundock, Vincent, Noah Greifer, and Andrew Heiss. 2024. “How to Interpret Statistical Models Using Marginaleffects for r and Python.” 111. doi:10.18637/jss.v111.i09.
Cattaneo, Matias, Yingjie Feng, Filippo Palomba, and Rocio Titiunik. 2025. “Scpi: Uncertainty Quantification for Synthetic Control Methods.” Journal of Statistical Software 113: 138. https://www.jstatsoft.org/article/view/v113i01.
Chiu, Albert, Xingchen Lan, Ziyi Liu, and Yiqing Xu. 2025. “Causal Panel Analysis Under Parallel Trends: Lessons from a Large Reanalysis Study.” American Political Science Review: 1–22. doi:10.1017/S0003055425000243.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2023. “Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments.” doi:10.2139/ssrn.4487202.
Dunford, Eric. 2025. “Tidysynth: A Tidy Implementation of the Synthetic Control Method.” https://CRAN.R-project.org/package=tidysynth.
Egerod, Benjamin CK, and Florian M. Hollenbach. 2024. “How Many Is Enough? Sample Size in Staggered Difference-in-Differences Designs.” OSF Preprint. https://files.osf.io/v1/resources/ac5ru_v1/providers/osfstorage/6672d409b5a03600dc9ff454?action=download&direct&version=4.
Lee, Soo Jeong, and Jeffrey M. Wooldridge. 2025. “Simple Approaches to Inference with Difference-in-Differences Estimators with Small Cross-Sectional Sample Sizes.” doi:10.2139/ssrn.5325686.
McDermott, Grant. 2026. Ritest: Randomisation Inference Testing. https://github.com/grantmcdermott/ritest.
Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics 235(2): 2218–44. doi:10.1016/j.jeconom.2023.03.008.
Wing, Coady, Madeline Yozwiak, Alex Hollingsworth, Seth Freedman, and Kosali Simon. 2024. “Designing Difference-in-Difference Studies with Staggered Treatment Adoption: Key Concepts and Practical Guidelines.” Annual Review of Public Health 45(Volume 45, 2024): 485–505. doi:10.1146/annurev-publhealth-061022-050825.
Zeileis, Achim, Susanne Koll, and Nathaniel Graham. 2020. “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in r.” 95. doi:10.18637/jss.v095.i01.