A deep dive into a simple way to do synthetic control with a Difference in Differences set up.
Author
Jeff Milliman
Published
December 21, 2025
Introduction
In this tutorial I am going to run through how to do synthetic control with the new difference in differences method introduced by Economists Soo Jeong Lee and Jeffrey Wooldridge, of Introductory Econometrics fame, in R. For a full introduction to their method, I would check out their newly released working paper - see (Lee and Wooldridge 2025). In addition, the DID Digest substack has a great summary of the paper and the benefits of this method over other options. The authors also apply their method to the more typical event study setting (i..e. Staggered Diff in Diff), with multiple treated units treated at different time periods, which I may return to in a later tutorial. Any errors made during this tutorial are entirely my own.
What is Synthetic Control?
For those unfamiliar with synthetic control, synthetic control is a quantitative method for case comparison used in situations in which only one unit or a small number of units experiences an intervention/treatment. In this situation, more traditional methods, such as difference in differences, tend to work poorly with very few or only one treated unit.
In short, synthetic control uses control/untreated units to constructs weights that attempt to closely approximate the treated unit prior to the treatment/intervention. These weights are used to predict the counterfactual outcome for the treated unit in the post period to approximate the outcome in the absence of the treatment/intervention. Effects are calculated as the difference between the observed outcome and the synthetic control (predicted counterfactual) in the post treatment period. For a a brief introduction to synthetic control methods, I recommend the chapter on synthetic control in Scott Cunningham’s Causal Inference: Mixtape, the chapter in Causal Inference for the Brave and True, and the 2021 Abadie overview paper (Abadie 2021).
In contrast to using synthetic control, the authors provide an extremely simple way to do policy analysis/causal inference that relies on the more ubiquitous difference in differences method - i.e. (post treated outcome - pre treated outcome) - (post control outcome - pre control outcome). For resources on difference in differences methods, I recommend this general overview (de Chaisemartin and D’Haultfœuille 2023) and additional resources on recent developments for staggered difference in differences (Roth et al. 2023; Wing et al. 2024)
Preliminary Requirements
For the remainder of this tutorial I will assume that you have:
1) At least an introductory knowledge of synthetic control.
2) That you have at least some familiarity with difference in differences (DID) for causal inference/policy analysis.
Why I am excited about this method
Anyone who has ever run a synthetic control analysis knows that synthetic control is like the wild west. While synthetic control was extremely novel when it was introduced in 2003 (Abadie and Gardeazabal 2003), the method has now expanded to dozens of packages and estimators, many of them utilizing machine learning methods, such as LASSO or Ridge Regression to estimate the synthetic control weights, and conformal inference or parametric bootstrapping for construction of prediction intervals and inference (Cattaneo et al. 2025). Many of these methods are outside the knowledge base of most policy analysts and social scientists, whose training tends to be in applied statistics or econometrics.
There is also no guarantee that any synthetic control method will generate a good fit to the treated unit in the pre treatment period, post treatment period inference can be challenging, and there appears to be little in the way of clear guidelines as to how a synthetic control analysis should be done with many of these estimators.
Soo Jeong Lee and Jeffrey Wooldridge have come to the rescue with an incredibly simple method that appears to perform remarkably well even in extremely small samples - i.e. 1 treated unit and 4 control units - and compares favorably to synthetic control in their applied examples. Of course, all the usual assumptions required to do difference in differences will apply, but their paper represents a promising step in a more simple direction.
Steps in the process
Transform the outcome variable. First, the authors log the outcome variable to make the outcome variable more amenable to the demeaning or de-trending process. Next, the authors recommend either unit specific de-trending or unit specific demeaning to make the parallel trends assumption between the treated unit and the control units more plausible.
Unit Specific Demeaning. To demean your outcome variable, calculate the average of your outcome variable for each unit in the pre-treatment period and then subtract that average from the outcome variable for each unit for your entire time series (both the pre and post periods). This step is necessary to construct the plots. The authors do this by regressing the outcome variable in the pre treatment period on an intercept and a unit level dummy variable (in R - lm(outcome ~ 1 + unit, data = data)). They then subtract this from the entire time series, but you can also do the demeaning process manually. For inference, the authors only appear to run a regression that includes the post period demeaned or de-trended outcome (more on this below). However, the entire demeaned or detrended time series is useful to produce residual plots, which can give us a sense of how good a fit the demeaning or de-trending process produces between the treated and control units.
Unit Specific De-trending. To de-trend your outcome variable, first split your data set into unit level data sets. Then you fit a regression to estimate the time trend in the the pre-treatment period for each individual unit, regressing the outcome in the pre treatment period on an intercept and your time variable. In R this would take the form of lm(outcome ~ 1 + time, data = data). You then use the regression to predict the time trend for the entire time series (pre and post treatment periods) for each unit. The predictions from your models are then subtracted from the outcome variable for each unit for the entire time series to complete the de-trending process.
Check the Pre and Post Period Fit between the Treated Unit and your Control Units. To check how closely your controls units “parallel” the treated unit in the pre treatment period, you can compare the average transformed outcome of the control units to the transformed outcome for the treated unit in the pre treatment period or for the entire time series. If the average outcome for the control units closely parallels the treated unit- i.e. increases or decreases by approximately the same amount from period to period as the treated unit in the pre treatment period, you can probably consider the pre treatment period fit to be good enough to continue on to estimating the model.
Calculate the Post Unit Level Averages of the Outcome Variable. For most event studies you usually conduct your analysis on your entire time series. However, the authors collapse the entire time series into a cross-sectional dataset by simply averaging the post treatment outcomes for each unit. To conduct the analysis for specific post treatment periods - i.e. a single year - the authors keep the post treatment period outcome for that specific year as a separate outcome variable.
Estimate the average (ATT) and Year Specific Effects. Finally, to estimate the average effect for the treated units (ATT), you simply regress the post period tranformed outcome variable on an intercept and your treatment variable. In R this would look like lm(post ~ 1 + treated, data = data). For year specific effects, you would swap out the post average outcome with a year average outcome and run the same regression. This is similiar to conducting a simple ANOVA post score analysis.
Applied Example with Analysis of the California Tobacco Control Program
To test their method for synthetic control the authors reanalyze the impact of the 1988 California tobacco control program (Proposition 99) on cigarette sales per capita (Abadie, Diamond, and Hainmueller 2010). Following the authors, I will not include covariates in the analysis, although the authors indicate that time invariant controls or averages of pre-treatment time varing controls could be added in the outcome regressions. The data set comes pre-loaded with the tidysynth package (Dunford 2025) and the following list describe the key variables and research design:
Unit of Analysis: State Level (U.S. States).
Treated Unit: California
Control Units: 38 states (excludes 11 other states that also implemented a similar tobacco control program)
Treatment Year: 1988 -1989 first year post treatment
Time Series: 1970: 2000 (19 years pre-treatment, 12 years post treatment)
Outcome Variable: the log of cigarette sales per capita (cigsale)
Loading in and Cleaning the Data
Code
#Uses the smoking synthetic control dataset from tidysnth#load in packageslibrary(tidyverse)#tidysynth for the smoking datasetlibrary(tidysynth)#for extracting coefficents/marginal effectslibrary(marginaleffects)#for tableslibrary(gt)#load in the smoking dataset from tidysynthdata(smoking)#arrange by state and yearsmoking <- smoking |>arrange(state, year) #create our treatment indicator#treated = 1 when state is california#treated post = 1 when state is california and post treatment - >= 1989#Create the outcome variable as the log of cigarette salessmoking <- smoking |>#create the treated unitmutate(treated =case_when(state =="California"~1,.default =0),#set the post and treated variabletreated_post =case_when(state =="California"& year >=1989~1,.default =0),#Log the outcome variableslog_cigsale =log(cigsale))head(smoking)
# A tibble: 6 × 10
state year cigsale lnincome beer age15to24 retprice treated treated_post
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Alabama 1970 89.8 NA NA 0.179 39.6 0 0
2 Alabama 1971 95.4 NA NA 0.180 42.7 0 0
3 Alabama 1972 101. 9.50 NA 0.181 42.3 0 0
4 Alabama 1973 103. 9.55 NA 0.182 42.1 0 0
5 Alabama 1974 108. 9.54 NA 0.183 43.1 0 0
6 Alabama 1975 112. 9.54 NA 0.184 46.6 0 0
# ℹ 1 more variable: log_cigsale <dbl>
Unit Specific Demeaning
Following the steps in the paper, I will conduct the demeaning process first then move on to the de-trending process, which produces a better pre-treatment fit in their applied examples.
Code
##Unit Specific Demeaning#filter dataset by the pre periodpre_period <- smoking |>filter(year <1989)#fit regression for the preperiod with the outcome predicted by #intercept and state dummypre_reg <-lm(log_cigsale ~1+ state, data = pre_period)#grab fitted values(predictions) and residualspre_period$pred_pre <-fitted(pre_reg)pre_period$resid_pre <-residuals(pre_reg)#calculate the predictions by group for the pre treatment periodpre_pred_avg <- pre_period |>group_by(state) |>summarise(pre_pred_avg =mean(pred_pre)) |>ungroup()#join preperiod averages to the smoking dataset #and subtract from the log of cigsales to demeansmoking <-left_join(smoking, pre_pred_avg, by ="state") |>mutate(cig_sale_demeaned = (log_cigsale - pre_pred_avg))#grab the demeaned outcomes for the regression by #averaging by state and perioddemeaned_outcomes <- smoking |>select(state, year, cig_sale_demeaned) |>#group by stategroup_by(state) |>#create our aggregate means in the pre and post periodssummarise(#post period meanpost_demeaned =mean(cig_sale_demeaned[year >=1989]),#in year 1989post_demeaned_1989 =mean(cig_sale_demeaned[year ==1989]),#in year 1995post_demeaned_1995 =mean(cig_sale_demeaned[year ==1995]),#in year 2000post_demeaned_2000 =mean(cig_sale_demeaned[year ==2000])) |>#ungroup our columnsungroup() |>#create our treatment indicator for Californiamutate(treated =case_when(state =="California"~1,.default =0)) |>#relocate the treated column after staterelocate(treated, .after = state)#look at our outcomeshead(demeaned_outcomes)
Figure 1: Checking Parallel Trends for the Demeaned Outcome
Code
#plot the demeaned outcome by group (controls and california)demeaned_plot_df <- smoking |>select(year, state, cig_sale_demeaned) |>mutate(group =case_when(state =="California"~"California",.default ="Control")) |>group_by(year, group) |>summarise(cig_sale_demeaned =mean(cig_sale_demeaned)) |>ungroup() |>#Suppress summarise grouping warningssuppressWarnings()#Plot the demeaned outcome by groupfigure1 <-ggplot(demeaned_plot_df, aes(x = year, y = cig_sale_demeaned, color = group)) +geom_line() +geom_vline(xintercept =1989, linetype =2) +scale_color_manual(values =c("California"="red","Control"="blue")) +labs(title ="Figure 1: Removing Pre Treatment Averages, All Controls", x ="year", y ="log(Cig Sales per Capita)") +theme_minimal()figure1
Looking at the residual plot in Figure 1 - i.e. plotting the observed minus the pre-treatment average by treatment group, the fit between California and the average of the control group states in the pre treatment period (1970 to 1989) is not great. We can clearly see that California starts to decline more sharply in the mid 1970s, while the control group declines at a much flatter rate. In this case, it would be hard to argue that the parallel trends assumption holds.
Calculating the Effects for the Demeaned Outcomes
To get the the ATT, we can run a simple regression, regressing the average post period outcome on an intercept and the treatment dummy variable.
#Demeaned ATTdemeaned_att <-lm(post_demeaned ~1+ treated, data = demeaned_outcomes)summary(demeaned_att)
Call:
lm(formula = post_demeaned ~ 1 + treated, data = demeaned_outcomes)
Residuals:
Min 1Q Median 3Q Max
-0.23014 -0.08548 0.00000 0.09464 0.22825
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.24616 0.01934 -12.726 4.39e-15 ***
treated -0.42217 0.12080 -3.495 0.00125 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1192 on 37 degrees of freedom
Multiple R-squared: 0.2482, Adjusted R-squared: 0.2279
F-statistic: 12.21 on 1 and 37 DF, p-value: 0.001249
Relative to the control states, the cigarette sales per capita (in packs) in California decreased by an average of approximately 42% through the entire post treatment period. To get the effects for a single time period we can run the same regression as above, but we swap out the post average outcome with the outcome for a single post treatment period - i.e. the year 2000.
#Effects for year 2000effects_2000 <-lm(post_demeaned_2000 ~1+ treated, data = demeaned_outcomes)summary(effects_2000)
Call:
lm(formula = post_demeaned_2000 ~ 1 + treated, data = demeaned_outcomes)
Residuals:
Min 1Q Median 3Q Max
-0.2712 -0.1535 0.0000 0.1384 0.2903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.35485 0.02632 -13.48 7.48e-16 ***
treated -0.66732 0.16435 -4.06 0.000244 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1622 on 37 degrees of freedom
Multiple R-squared: 0.3082, Adjusted R-squared: 0.2895
F-statistic: 16.49 on 1 and 37 DF, p-value: 0.0002441
Relative to the control units, California had approximately 67% fewer cigarette sales per capita (in packs) in the year 2000.
To recreate the results from Table 3 of the paper, which includes the overall ATT and effects for years 1989, 1995, and 2000, I created a couple of functions to fit the models separately and aggregate the effects into a single table with the marginaleffects package (Arel-Bundock, Greifer, and Heiss 2024). I will use these functions to reproduce the tables from the paper.
To make the estimates easier to interpret as percent changes I multiplied the estimate, standard error, and the confidence interval by 100. To keep the results consistent with those from the paper, I will forgo the usual approach to interpreting coefficients in linear models with a logged dependent variable that transforms the coefficients by exponentiation to construct the percent change and will instead use the simpler method of multiplying the treatment coefficient by 100.
Code
#calculate the effects for the demeaned outcomes#set the list of demeaned outcome variablesoutcomes_demeaned <-c("post_demeaned","post_demeaned_1989","post_demeaned_1995","post_demeaned_2000")#create our regression formula to apply to the list of demeaned outcomeslm_function <-function(outcome, iv, df) { mod_formula <-paste(outcome, "~ 1 +", iv) reg <-lm(mod_formula, data = df) }#create a list of modelsdemeaned_mods_list <-lapply(outcomes_demeaned, lm_function, iv ="treated", df = demeaned_outcomes)#set the names of the models in the listnames(demeaned_mods_list) <-c("Average Effect","Effect in 1989","Effect in 1995","Effect in 2000")#Create function to extract marginal effects from a list of models, #given your coefficient of interestmarginal_effects_table <-function(model_list, key_term){#create function to grab our marginal effects from each model extract_effects <-function(model, term) { marg_eff <- marginaleffects::avg_comparisons(model = model, variables = term) |>as.data.frame()return(marg_eff) }#apply the function (extract_effects) to model list, extract #the key coefficient (term) effects_table <- purrr::map2_dfr(model_list, key_term, extract_effects)#extract the model names from your list of models #and rename the column names_variables =names(model_list) |>#convert into a dataframeas.data.frame() |>#rename the names column to " Effects Period"rename("Effects Period"=1)#cbind names and table of effects effects_table <-cbind(names_variables, effects_table) |>#drop the predictions - keep only columns 1:10select(1:10) |>#round all numeric variables to 3 decimal placesmutate(across(where(is.numeric), round, digits =3),#multiply estimate, std.error, and confidence intervals #by 100 to get our percent changeestimate = (estimate*100),std.error = (std.error*100),conf.low = (conf.low*100),conf.high = (conf.high*100)) |>#rename estimate column to "estimate (percent change)"rename("estimate (percent change)"= estimate)return(effects_table)}#Apply the functions to recreate the tabletable_3_demeaned <-marginal_effects_table(model_list = demeaned_mods_list, key_term ="treated") #Make the table pretty and set title with gttable_3_demeaned <- table_3_demeaned |>gt() |>#Add in the titletab_header(title ="Table 3 (Estimated ATTs): Demeaned") |> gt::tab_options(row.striping.include_table_body =FALSE,quarto.disable_processing =TRUE)table_3_demeaned
Table 3 (Estimated ATTs): Demeaned
Effects Period
term
contrast
estimate (percent change)
std.error
statistic
p.value
s.value
conf.low
conf.high
Average Effect
treated
1 - 0
-42.2
12.1
-3.495
0.000
11.042
-65.9
-18.5
Effect in 1989
treated
1 - 0
-16.8
9.6
-1.756
0.079
3.660
-35.6
2.0
Effect in 1995
treated
1 - 0
-48.4
13.7
-3.518
0.000
11.166
-75.3
-21.4
Effect in 2000
treated
1 - 0
-66.7
16.4
-4.060
0.000
14.316
-98.9
-34.5
A Quick Note on Standard Errors and Uncertainty Calculations
The authors recommend the use of heteroskedasticity-consistent standard errors - “HC3” - if there are a large number of control and/or treated units. To add in the heteroskedasticity-consistent standard errors, you would simply add the vcov = “HC3” argument to the avg_comparisons function - i.e. avg_comparisons(model = model, vcov = "HC3") - in the code above (see the link for more details). Alternatively, you could correct the standard errors on a single regression fit with the lm() function with the sandwich package (Zeileis, Koll, and Graham 2020).
The authors also check the robustness of some of their ATT estimates using randomization inference with 1,000 replications, which can be implemented using the ritest package in R (McDermott 2026). According to the authors, randomization inference is beneficial because it does not rely on the “normality assumption” (Lee and Wooldridge 2025, 8) and is similar to the permutation tests that provide inference in the standard synthetic control design (Abadie 2021). For an overview of the pros and cons of randomization inference in observational studies I recommend the section “Which uncertainty matters? Randomization Inference (RI)” from Claire Palandri’s guide, Causal Inference in Observational Studies, which is posted on her github.
Unit Specific De-trending
To conduct the unit specific de-trending, I will follow the “many models” approach from the first edition of R for Data Science and nest the unit (state) level datasets inside a single data frame before aggregating the predictions and subtracting them from the original outcome (log cigsale).
Code
##Unit De-trending#nest the smoking dataset, grouping by statenested_smoking <- smoking |>group_by(state) |>nest()#fit a linear model to each nested dataframe#predict the linear trend in the preperiod (year < 1989)#i.e. predict outcome with intercept + time (year)pre_trend_reg <-function(df) {lm(log_cigsale ~1+ year, data = df, subset = (year <1989))}#fit the regression model to each individual nested state data setnested_smoking <- nested_smoking |>mutate(pre_model = purrr::map(data, pre_trend_reg))#create the prediction function to predict the #the whole timeseries using the outcome fit in the preperiod. prediction_function <-function(model, newdata) {predict(model, newdata = newdata)}#fit the prediction function to the list of premodels to #get the predictions for each state's time seriesnested_smoking <- nested_smoking |>mutate(pred_detrend = purrr::map2(pre_model, data, prediction_function)) #extract the predictionspredictions <- nested_smoking |>unnest(pred_detrend) |> dplyr::select(state, pred_detrend) |> dplyr::rename(state_match = state) |>ungroup()#bind the prediction columns to the original smoking datasetsmoking <-bind_cols(smoking, predictions)#create our detrended outcome by subtracting #the predections from the outcomesmoking <- smoking |>mutate(cig_sale_detrended = (log_cigsale - pred_detrend))#make our group averaged outcome columns for each period#using the detrended outcomeall_cons_detrended <- smoking|>select(state, year, cig_sale_detrended ) |>#groupby stategroup_by(state) |>#create our aggregate means in the pre period by state and periodsummarise(#post period meanpost_detrend =mean(cig_sale_detrended[year >=1989]),#in year 1989post_detrend_1989 =mean(cig_sale_detrended[year ==1989]),#in year 1995post_detrend_1995 =mean(cig_sale_detrended[year ==1995]),#in year 2000post_detrend_2000 =mean(cig_sale_detrended[year ==2000])) |>#ungroup our columnsungroup() |>#create our treatment indicator for Californiamutate(treated =case_when(state =="California"~1,.default =0)) |>#relocate the treated column after staterelocate(treated, .after = state)#glance at our datasethead(all_cons_detrended)
Figure 2: Checking Parallel Trends for the De-Trended Outcome
Code
#plot the fit for the detrened cig sale #by group (Control vs California) #create the detrended dataframedetrended_plot_df <- smoking |>select(year, state, cig_sale_detrended) |>mutate(group =case_when(state =="California"~"California",.default ="Control")) |>group_by(year, group) |>summarise(cig_sale_detrended =mean(cig_sale_detrended)) |>ungroup()#recreate figure 2 from paper figure2 <-ggplot(detrended_plot_df, aes(x = year, y = cig_sale_detrended, color = group)) +geom_line(aes(x = year, y = cig_sale_detrended, color = group)) +geom_vline(xintercept =1989, linetype =2) +labs(title ="Figure 2: Removing Pre-Treatment Linear Trends, All Controls", x ="year", y ="log(Cig Sales per Capita)") +ylim(-1,.25) +scale_color_manual(values =c("California"="red","Control"="blue")) +theme_minimal()#plot figure 2figure2
As we can see from the residual plot in Figure 2, the de-trending approach produces a much better pre treatment fit between California and the control states. The control group very closely follows the log(cigarette sales per capita) for California (the treated unit) in the pre treatment period (1970 - 1988). The fit between the average of the control units and California is almost as good as the fit from the original synthetic control analysis from (Abadie, Diamond, and Hainmueller 2010) and the parallel trends assumption is much more likely to hold than with the demeaning method.
Calculating the Effects for the De-trended Outcomes
To calculate the average and period specific effects for the de-trended outcomes I will simply reuse the functions that I used previously.
Code
#set our vector of dependent variables for the regressionoutcomes_detrended <-c("post_detrend","post_detrend_1989","post_detrend_1995","post_detrend_2000")#make the detrended models, applying the lm #function to the all controls detrended dataset. detrended_models <-lapply(outcomes_detrended, lm_function, iv ="treated", df = all_cons_detrended) #set the names for the modelsnames(detrended_models) <-c("Average Effect","Effect in 1989","Effect in 1995","Effect in 2000")#use the function to recreate the estimates from the papertable_3_detrended <-marginal_effects_table(model_list = detrended_models,key_term ="treated") #create table outbut with gttable_3_detrended <- table_3_detrended |>gt() |>#Add in the titletab_header(title ="Table 3 (Estimated ATTs): De-Trended") |> gt::tab_options(row.striping.include_table_body =FALSE,quarto.disable_processing =TRUE)table_3_detrended
Table 3 (Estimated ATTs): De-Trended
Effects Period
term
contrast
estimate (percent change)
std.error
statistic
p.value
s.value
conf.low
conf.high
Average Effect
treated
1 - 0
-22.7
9.4
-2.413
0.016
5.982
-41.1
-4.3
Effect in 1989
treated
1 - 0
-4.2
5.9
-0.713
0.476
1.071
-15.8
7.4
Effect in 1995
treated
1 - 0
-28.2
11.2
-2.515
0.012
6.393
-50.2
-6.2
Effect in 2000
treated
1 - 0
-40.3
15.2
-2.643
0.008
6.926
-70.2
-10.4
With a better pre-treatment fit, the average effect (ATT) closely approximates the ATT from the synthetic control estimate and the synthdid estimate from the Lee and Wooldridge working paper. The average effect (ATT) is approximately a -22.7% average decrease in cigarrette sales per capita for California relative to the control states through the entire post treatment period. The period specific effects (ATTs) increase significantly from a -4.2% change in 1989 to a -40.3% change in year 2000.
Figure 3: Gap Plot (De-trended Outcome)
Code
#Figure 3 - gap lotfigure3 <- detrended_plot_df |>#pivot detrended outcome widerpivot_wider(names_from = group, values_from = cig_sale_detrended) |>#create difference column between California and the controlsmutate(diff = (California - Control)) |>#plot the differenceggplot() +geom_line(aes(x = year, y = diff, group =1), color ="blue") +labs(title ="Figure 3: Gap Between California and Average of All Controls, State-Specific Linear Trend", x ="year", y ="gap") +geom_vline(xintercept =1989, linetype =2, color ="black") +theme_minimal()figure3
Figure 3 above reproduces the “Gap plot”, which plots the difference in log cigarette sales per capita between California and the control states through the entire time series. As we can see, the gap between California and the control states is roughly zero prior to 1985, with the gap starting to increase in the mid 1980s as California declines at a faster rate relative to the control group. The gap continues to grow negatively in the post treatment period (1989 onward).
DID Synthetic Control with only 4 control variables
Up until know I have run the analysis as analogous to a typical synthetic control design where we use all the available control units (38 states) to compare with our treated unit (California). However, as a robustness check, the authors rerun their analysis with two groups of only four control states (two groups of 4 controls each from the South and the Midwest). This is, to put it mildly, a little bit crazy.
As the authors point out, 4 units is an extremely low number to put into a control group and states from the South and the Midwest would appear to be “bad controls” as they are dissimilar from California culturally, economically, and demographically. I’ll reproduce their results with the southern and midwest control groups, but I will focus on the analysis with the de-trended outcome (log cig sale), as this method generates a better pre-treatment fit and results that are closer to those from other estimators than the demeaning method.
DID Synthetic Control with 4 Southern States as the Control Group
Code to clean the data and create the California plus south states control group
Code
#try southern states as controls + California#Set the controls plus Californiasouth_controls <-c("Alabama", "Arkansas","California", "Louisiana", "Mississippi")#make our columns for each period with the detrended datasetsouth_cons_detrended <- smoking |>select(state, year, cig_sale_detrended ) |>#filter for our southern controls plus californiafilter(state %in% south_controls) |>#groupby stategroup_by(state) |>#create our aggregate means in the pre periodsummarise(#post period meanpost_detrend =mean(cig_sale_detrended[year >=1989]),#in year 1989post_detrend_1989 =mean(cig_sale_detrended[year ==1989]),#in year 1995post_detrend_1995 =mean(cig_sale_detrended[year ==1995]),#in year 2000post_detrend_2000 =mean(cig_sale_detrended[year ==2000])) |>#ungroup our columnsungroup() |>#create our treatment indicator for Californiamutate(treated =case_when(state =="California"~1,.default =0)) |>#relocate the treated column after staterelocate(treated, .after = state)
Figure 6: Checking Parallel Trends Between California and the Southern Control Group (De-trended Outcome)
Code
#lets look at the plot of the fit of the southern states vs CAfigure6 <- smoking |>select(year, state, cig_sale_detrended) |>filter(state %in% south_controls) |>mutate(group =case_when(state =="California"~"California",.default ="Control")) |>group_by(year, group) |>summarise(cig_sale_detrended =mean(cig_sale_detrended)) |>ungroup() |>#recreate figure 6ggplot() +geom_line(aes(x = year, y = cig_sale_detrended, color = group)) +geom_vline(xintercept =1989, linetype =2) +labs(title ="Figure 6: Removing Pre-Treatment Linear Trends, AL, AR, LA, MS as Controls", x ="year", y ="log(Cig Sales per Capita)") +ylim(-1,.25) +scale_color_manual(values =c("California"="red","Control"="blue")) +theme_minimal()figure6
Even with only 4 southern states in the control group, the de-trending method still provides a pretty good fit between California and the 4 southern states.
Calculating the Effects for the De-trended Outcomes with 4 Southern Control States
Code
#run our southern modelssouth_models <-lapply(outcomes_detrended, lm_function, iv ="treated", df = south_cons_detrended)#set our namesnames(south_models) <-c("Average Effect","Effect in 1989","Effect in 1995","Effect in 2000")#let see how we do with table_4table_4_detrended <-marginal_effects_table(model_list = south_models,key_term ="treated")#convert to table with gt table_4_detrended <- table_4_detrended |>gt() |>#Add in the titletab_header(title ="Table 4 (Estimated ATTs): De-Trended - South Controls (AL, AR, LA, MS)") |> gt::tab_options(row.striping.include_table_body =FALSE,quarto.disable_processing =TRUE) table_4_detrended
Looking at Table 4, we get a very similar overall ATT and similar period specific effects as we did when we included all the other 38 states as controls. This is quite remarkable and suggests that this method may work in situations where there is not a large control group or with a control group that might be less than ideal. The authors repeat their analysis with a control group of only 4 midwest states and get similar results. Which is, once again, quite remarkable with such a small control group that would not appear to be an ideal control for California.
Calculating the Effects for the De-trended Outcome with 4 Midwest Control States
Cleaning and analysis code for the midwest control states
Code
#lets check the midwest#Set our midwest controls + Californiamw_controls <-c("California", "Illinois", "Iowa", "Minnesota", "Ohio")#create out midwest detrended datasetmw_controls_detrended <- smoking |>select(state, year, cig_sale_detrended ) |>#filter for our midwest controls plus californiafilter(state %in% mw_controls) |>#groupby stategroup_by(state) |>#create our aggregate means in the pre periodsummarise(#post period meanpost_detrend =mean(cig_sale_detrended[year >=1989]),#in year 1989post_detrend_1989 =mean(cig_sale_detrended[year ==1989]),#in year 1995post_detrend_1995 =mean(cig_sale_detrended[year ==1995]),#in year 2000post_detrend_2000 =mean(cig_sale_detrended[year ==2000])) |>#ungroup our columnsungroup() |>#create our treatment indicator for Californiamutate(treated =case_when(state =="California"~1,.default =0)) |>#relocate the terated column after state dplyr::relocate(treated, .after = state)#run our midwest modelsmw_models <-lapply(outcomes_detrended, lm_function, iv ="treated", df = mw_controls_detrended)#set our namesnames(mw_models) <-c("Average Effect","Effect in 1989","Effect in 1995","Effect in 2000")#let see how we do with table_5table_5_detrended <-marginal_effects_table(model_list = mw_models,key_term ="treated")table_5_detrended <- table_5_detrended |>gt() |>#Add in the titletab_header(title ="Table 5 (Estimated ATTs): De-Trended - Midwest Controls (IL, IA, MN, OH)") |> gt::tab_options(row.striping.include_table_body =FALSE,quarto.disable_processing =TRUE)table_5_detrended
While the method is intuitive, simple, and appears to deliver good results with small samples sizes and less than ideal control groups, I do see three drawbacks to this approach. That being said, this method might be a great tool to have, especially if you are pressed for time and need to asses the impact of an intervention on a small number of treated units with a small control group.
Parallel trends may not hold: Like other Diff in Diff methods there is no guarantee that the parallel trends assumption will hold and transforming the outcome variable through demeaning or de-trending may or may not make this assumption more plausible. There more approaches to de-trending, such as quadratic or polynomial de-trending, but it is unclear to me how often transforming the outcome variable will work in most policy evaluation research designs.
Low statistical power: Many Diff in Diff studies are under powered to detect plausible effect sizes with the typical sample sizes used in these studies (Chiu et al. 2025; Egerod and Hollenbach 2024). Unfortunately, this problem is likely worse in situations where a single unit is treated. Regarding this analysis, the average difference between the treatment and the control groups (the ATT) was around -20% to around -66% for specific period effects depending on how the outcome was transformed. Effects this large are not likely to be realistic in many applied settings and most interventions will probably not produce such large effect sizes.
The power analysis in the working paper by Egerod and Hollenbach (2024) suggests that with a logged outcome and a small number of treated units in a typical staggered event study setting (multiple units treated at multiple times) you need an ATT around 15% to reach the benchmark 80% power. While I would need to simulate data to confirm this, using 15% as a baseline and taking into account the effect sizes that were detected with the southern and midwest control groups of 4 units, I think it is reasonable to suggest that an ATT of around 15% to 20% is likely near the minimum effect size that can be reliably detected in this type of design with a single unit treated at one point in time and a logged outcome.
You could increase your statistical power by adding in controls to the outcome regression or more treated and control units, but my sense is that having an intervention with a large effect size is probably more important for statistical power than increasing your sample size. Unfortunately, unlike experiments, there is no clear way to increase effect sizes for retrospective observational studies. You are stuck with what you have.
You might also be able to improve your statistical power with a single or a few treated units by focusing on cumulative effects (Andrew Wheeler makes a persuasive argument for this in synthetic control designs), rather than on the average (ATT) or period specific effects, but there may not be a straightforward way to construct confidence intervals or standard errors for cumulative effects. The options for doing so, such as the parametric or nonparametric bootstrap, can be computationally intensive and may return confidence intervals that are too narrow with one treated unit.
May need a large number of pre and post treatment periods: This method likely works best with a large number of pre treatment and post treatment periods that can make the averaging of the time series into a cross section more feasible. However, this method may not work as well in the worst case scenario where you have a small number of control units and your time series is short - i.e. only a couple of pre and post treatment periods.
References
Abadie, Alberto. 2021. “Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects.”Journal of Economic Literature 59(2): 391–425.
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California’s Tobacco Control Program.”Journal of the American Statistical Association 105(490): 493–505. doi:10.1198/jasa.2009.ap08746.
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.”American economic review 93(1): 113–32.
Arel-Bundock, Vincent, Noah Greifer, and Andrew Heiss. 2024. “How to Interpret Statistical Models Using Marginaleffects for r and Python.” 111. doi:10.18637/jss.v111.i09.
Cattaneo, Matias, Yingjie Feng, Filippo Palomba, and Rocio Titiunik. 2025. “Scpi: Uncertainty Quantification for Synthetic Control Methods.”Journal of Statistical Software 113: 138. https://www.jstatsoft.org/article/view/v113i01.
Chiu, Albert, Xingchen Lan, Ziyi Liu, and Yiqing Xu. 2025. “Causal Panel Analysis Under Parallel Trends: Lessons from a Large Reanalysis Study.”American Political Science Review: 1–22. doi:10.1017/S0003055425000243.
de Chaisemartin, Clément, and Xavier D’Haultfœuille. 2023. “Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments.” doi:10.2139/ssrn.4487202.
Lee, Soo Jeong, and Jeffrey M. Wooldridge. 2025. “Simple Approaches to Inference with Difference-in-Differences Estimators with Small Cross-Sectional Sample Sizes.” doi:10.2139/ssrn.5325686.
Roth, Jonathan, Pedro H. C. Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.”Journal of Econometrics 235(2): 2218–44. doi:10.1016/j.jeconom.2023.03.008.
Wing, Coady, Madeline Yozwiak, Alex Hollingsworth, Seth Freedman, and Kosali Simon. 2024. “Designing Difference-in-Difference Studies with Staggered Treatment Adoption: Key Concepts and Practical Guidelines.”Annual Review of Public Health 45(Volume 45, 2024): 485–505. doi:10.1146/annurev-publhealth-061022-050825.
Zeileis, Achim, Susanne Koll, and Nathaniel Graham. 2020. “Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in r.” 95. doi:10.18637/jss.v095.i01.