Data Analytics for Finance

BM17FI · Rotterdam School of Management

RSM Logo

ASSIGNMENT 04

Difference-in-Differences

Setup¶

Run these cells to prepare your environment.

✅ The environment is cleared and ready.
/Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assignment
> s/04-assignment
📁 Base directory: /Users/casparm4/Github/rsm-data-analytics-in-finance-private
> /private/assignments/04-assignment
📁 Raw data folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-privat
> e/private/assignments/04-assignment/data/raw
📁 Processed data folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-
> private/private/assignments/04-assignment/data/processed
📁 Output directory: /Users/casparm4/Github/rsm-data-analytics-in-finance-priva
> te/private/assignments/04-assignment/output
📁 Tables folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/
> private/assignments/04-assignment/output/tables
📁 Figures folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-private
> /private/assignments/04-assignment/output/figures

Learning Objectives¶

By completing this assignment, you will:

  1. Understand the DiD identification strategy and its assumptions
  2. Set up panel data for difference-in-differences analysis
  3. Estimate canonical DiD and two-way fixed effects models
  4. Visualize parallel trends to assess the key identifying assumption
  5. Apply clustered standard errors appropriate for panel data
  6. Critically evaluate DiD assumptions with small samples

Research Question¶

"Did German automakers experience differential changes in ESG risk scores compared to non-German automakers following the Dieselgate scandal?"

We use RepRisk Index (RRI) scores as our outcome variable. RepRisk is a leading ESG data provider that systematically monitors companies for environmental, social, and governance risks based on media, stakeholder, and third-party sources.

The Dieselgate scandal broke on September 18, 2015, when the U.S. Environmental Protection Agency (EPA) issued a Notice of Violation to Volkswagen for installing "defeat devices" in diesel vehicles to cheat emissions tests.

ℹ️ Background: Difference-in-Differences
Difference-in-Differences (DiD) is a quasi-experimental research design that estimates causal effects by comparing changes over time between a treatment group and a control group.

The key idea: If both groups would have followed parallel trends in the absence of treatment, then the difference in their differences (before vs. after) identifies the causal effect.

DiD Estimator:
τ = (Ytreat,post - Ytreat,pre) - (Ycontrol,post - Ycontrol,pre)

Key Assumption: Parallel trends — in the absence of treatment, the treatment and control groups would have evolved similarly over time.
ℹ️ Background: RepRisk Index (RRI)
The RepRisk Index (RRI) ranges from 0 to 100, where higher values indicate greater ESG risk exposure:
  • 0-25: Low risk exposure
  • 26-49: Medium risk exposure (typical for large multinationals)
  • 50-59: High risk exposure
  • 60-74: Very high risk exposure
  • 75-100: Extremely high risk exposure

Interpretation: An increase in RRI indicates that the company has received more negative media coverage and stakeholder criticism related to ESG issues.
📝 Tasks
In this assignment, you will complete the following graded tasks:
  1. Load the panel dataset and set up the panel structure
  2. Clean the sample by dropping VW group subsidiaries
  3. Create the post-treatment indicator variable
  4. Calculate summary statistics by group and time period
  5. Visualize parallel trends
  6. Estimate the canonical DiD regression
  7. Add control variables
  8. Estimate two-way fixed effects models
  9. Apply clustered standard errors
  10. Create an event study plot
  11. Export a publication-quality regression table
  12. Critically evaluate the DiD assumptions

Section 1: Load and Examine Panel Structure¶

In this section, you'll load the panel dataset, clean the sample by removing VW group subsidiaries, and set up the panel structure for DiD analysis.

Task 1.1: Load the Dataset¶

Load the file auto_firms_panel_did.dta from the raw data folder.

Stata Stata Tip
Use use "$raw/filename.dta", clear to load a Stata dataset.
✅ Dataset loaded successfully

Task 1.2: Examine the Dataset¶

Use describe to examine the structure of the dataset. Then use tabulate to see the distribution of firms by country (german indicator).

Contains data from /Users/casparm4/Github/rsm-data-analytics-in-finance-private
> /private/assignments/04-assignment/data/raw/auto_firms_panel_did.dta
 Observations:         3,840                  
    Variables:            20                  09 Jan 2026 12:18
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
gvkey           str6    %-9s                  
RepRisk_ID      double  %10.0g                
conm            str28   %-9s                  
fic             str3    %-9s                  
sic             str4    %-9s                  
naics           str6    %-9s                  
german          long    %12.0g                
date            double  %td                   
year            double  %10.0g                
month           double  %10.0g                
current_RRI     double  %10.0g                
at              double  %10.0g                
roa             double  %10.0g                
cash_ratio      double  %10.0g                
log_sale        double  %10.0g                
leverage        double  %10.0g                
RepRisk_rating  str3    %-9s                  
social_percen~e str4    %-9s                  
governance_pe~e str4    %-9s                  
environmental~e str4    %-9s                  
-------------------------------------------------------------------------------
Sorted by: 
     german |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      3,480       90.62       90.62
          1 |        360        9.38      100.00
------------+-----------------------------------
      Total |      3,840      100.00

                      |        german
                 conm |         0          1 |     Total
----------------------+----------------------+----------
ANHUI ANKAI AUTOMOB.. |        60          0 |        60 
ANHUI JIANGHUAI AUT.. |        60          0 |        60 
    ASHOK LEYLAND LTD |        60          0 |        60 
ASTON MARTIN LAGOND.. |        60          0 |        60 
              AUDI AG |         0         60 |        60 
          AUTECH CORP |        60          0 |        60 
AVIC SHENYANG AIRCR.. |        60          0 |        60 
         AVTOVAZ PJSC |        60          0 |        60 
BAIC BLUEPARK NEW E.. |        60          0 |        60 
  BAIC MOTOR CORP LTD |        60          0 |        60 
BAYERISCHE MOTOREN .. |         0         60 |        60 
BRILLIANCE CHINA AU.. |        60          0 |        60 
      BYD COMPANY LTD |        60          0 |        60 
CHONG QING CHANGAN .. |        60          0 |        60 
CHONGQING DIMA INDU.. |        60          0 |        60 
DONGFENG MOTOR GROU.. |        60          0 |        60 
     DRB HICOM BERHAD |        60          0 |        60 
        EICHER MOTORS |        60          0 |        60 
           FERRARI NV |        60          0 |        60 
FORD OTOMOTIV SANAY.. |        60          0 |        60 
             GAZ PJSC |        60          0 |        60 
              GB CORP |        60          0 |        60 
GEELY AUTOMOBILE HL.. |        60          0 |        60 
  GREAT WALL MOTOR CO |        60          0 |        60 
GUANGZHOU AUTOMOBIL.. |        60          0 |        60 
 HINDUSTAN MOTORS LTD |        60          0 |        60 
      HINO MOTORS LTD |        60          0 |        60 
 HONDA ATLAS CARS LTD |        60          0 |        60 
   HONDA MOTOR CO LTD |        60          0 |        60 
 HYUNDAI MOTOR CO LTD |        60          0 |        60 
  INDUS MOTORS CO PLC |        60          0 |        60 
     ISUZU MOTORS LTD |        60          0 |        60 
JIANG LING MOTORS C.. |        60          0 |        60 
            KAMAZ PTC |        60          0 |        60 
     KG MOBILITY CORP |        60          0 |        60 
      KIA CORPORATION |        60          0 |        60 
 LIKHACHOV PLANT PJSC |        60          0 |        60 
MAHINDRA & MAHINDRA.. |        60          0 |        60 
               MAN SE |         0         60 |        60 
MARUTI SUZUKI INDIA.. |        60          0 |        60 
     MAZDA MOTOR CORP |        60          0 |        60 
MERCEDES BENZ GROUP.. |         0         60 |        60 
MITSUBISHI MOTORS C.. |        60          0 |        60 
         MORITA HLDGS |        60          0 |        60 
  NISSAN MOTOR CO LTD |        60          0 |        60 
ORIENTAL HOLDINGS BHD |        60          0 |        60 
PORSCHE AUTOMOBIL H.. |         0         60 |        60 
           RENAULT SA |        60          0 |        60 
ROSENBAUER INTERNAT.. |        60          0 |        60 
  SAIC MOTOR CORP LTD |        60          0 |        60 
            SCANIA AB |        60          0 |        60 
         SOLLERS PJSC |        60          0 |        60 
        STELLANTIS NV |        60          0 |        60 
          SUBARU CORP |        60          0 |        60 
  SUZUKI MOTOR CO LTD |        60          0 |        60 
TATA MOTORS PASSENGER |        60          0 |        60 
TOFAS-TURK OTOMOBIL.. |        60          0 |        60 
    TOYOTA MOTOR CORP |        60          0 |        60 
     UMW HOLDINGS BHD |        60          0 |        60 
        VOLKSWAGEN AG |         0         60 |        60 
             VOLVO AB |        60          0 |        60 
WILLIAMS GRAND PRIX.. |        60          0 |        60 
YANGZHOU YAXING COA.. |        60          0 |        60 
   YULON MOTOR CO LTD |        60          0 |        60 
----------------------+----------------------+----------
                Total |     3,480        360 |     3,840 

Task 1.3: Drop VW Group Subsidiaries¶

ℹ️ Background: Sample Construction
For a clean DiD analysis, we need to drop firms that are part of the Volkswagen Group (subsidiaries and holding companies):
  • AUDI AG (gvkey: 101120) — VW subsidiary
  • PORSCHE AUTOMOBIL HOLDING SE (gvkey: 102187) — VW Group holding company
  • SCANIA AB (gvkey: 062557) — VW subsidiary (commercial vehicles)
  • MAN SE (gvkey: 100042) — VW subsidiary (trucks/buses)

Why drop these firms?
  1. Avoid double-counting: Subsidiaries' outcomes are mechanically linked to VW's scandal
  2. Clean identification: We want to compare independent German automakers (VW, BMW, Mercedes-Benz) to non-German peers
  3. Economic interpretation: Holding companies and subsidiaries don't represent separate firms for our research question

Drop the four VW Group firms listed above using the drop if command with the gvkey variable.

Stata Stata Tip
Use drop if inlist(gvkey, "value1", "value2", ...) to remove observations matching any of the listed values.
(240 observations deleted)
✅ VW Group firms dropped
Remaining firms: 

                      |        german
                 conm |         0          1 |     Total
----------------------+----------------------+----------
ANHUI ANKAI AUTOMOB.. |        60          0 |        60 
ANHUI JIANGHUAI AUT.. |        60          0 |        60 
    ASHOK LEYLAND LTD |        60          0 |        60 
ASTON MARTIN LAGOND.. |        60          0 |        60 
          AUTECH CORP |        60          0 |        60 
AVIC SHENYANG AIRCR.. |        60          0 |        60 
         AVTOVAZ PJSC |        60          0 |        60 
BAIC BLUEPARK NEW E.. |        60          0 |        60 
  BAIC MOTOR CORP LTD |        60          0 |        60 
BAYERISCHE MOTOREN .. |         0         60 |        60 
BRILLIANCE CHINA AU.. |        60          0 |        60 
      BYD COMPANY LTD |        60          0 |        60 
CHONG QING CHANGAN .. |        60          0 |        60 
CHONGQING DIMA INDU.. |        60          0 |        60 
DONGFENG MOTOR GROU.. |        60          0 |        60 
     DRB HICOM BERHAD |        60          0 |        60 
        EICHER MOTORS |        60          0 |        60 
           FERRARI NV |        60          0 |        60 
FORD OTOMOTIV SANAY.. |        60          0 |        60 
             GAZ PJSC |        60          0 |        60 
              GB CORP |        60          0 |        60 
GEELY AUTOMOBILE HL.. |        60          0 |        60 
  GREAT WALL MOTOR CO |        60          0 |        60 
GUANGZHOU AUTOMOBIL.. |        60          0 |        60 
 HINDUSTAN MOTORS LTD |        60          0 |        60 
      HINO MOTORS LTD |        60          0 |        60 
 HONDA ATLAS CARS LTD |        60          0 |        60 
   HONDA MOTOR CO LTD |        60          0 |        60 
 HYUNDAI MOTOR CO LTD |        60          0 |        60 
  INDUS MOTORS CO PLC |        60          0 |        60 
     ISUZU MOTORS LTD |        60          0 |        60 
JIANG LING MOTORS C.. |        60          0 |        60 
            KAMAZ PTC |        60          0 |        60 
     KG MOBILITY CORP |        60          0 |        60 
      KIA CORPORATION |        60          0 |        60 
 LIKHACHOV PLANT PJSC |        60          0 |        60 
MAHINDRA & MAHINDRA.. |        60          0 |        60 
MARUTI SUZUKI INDIA.. |        60          0 |        60 
     MAZDA MOTOR CORP |        60          0 |        60 
MERCEDES BENZ GROUP.. |         0         60 |        60 
MITSUBISHI MOTORS C.. |        60          0 |        60 
         MORITA HLDGS |        60          0 |        60 
  NISSAN MOTOR CO LTD |        60          0 |        60 
ORIENTAL HOLDINGS BHD |        60          0 |        60 
           RENAULT SA |        60          0 |        60 
ROSENBAUER INTERNAT.. |        60          0 |        60 
  SAIC MOTOR CORP LTD |        60          0 |        60 
         SOLLERS PJSC |        60          0 |        60 
        STELLANTIS NV |        60          0 |        60 
          SUBARU CORP |        60          0 |        60 
  SUZUKI MOTOR CO LTD |        60          0 |        60 
TATA MOTORS PASSENGER |        60          0 |        60 
TOFAS-TURK OTOMOBIL.. |        60          0 |        60 
    TOYOTA MOTOR CORP |        60          0 |        60 
     UMW HOLDINGS BHD |        60          0 |        60 
        VOLKSWAGEN AG |         0         60 |        60 
             VOLVO AB |        60          0 |        60 
WILLIAMS GRAND PRIX.. |        60          0 |        60 
YANGZHOU YAXING COA.. |        60          0 |        60 
   YULON MOTOR CO LTD |        60          0 |        60 
----------------------+----------------------+----------
                Total |     3,420        180 |     3,600 

Task 1.4: Create the Post-Treatment Indicator¶

Create a variable called post that equals 1 for observations from September 2015 onwards, and 0 otherwise.

The Dieselgate scandal broke on September 18, 2015. Since our data is monthly (end-of-month dates), we treat August 2015 as the last "pre-treatment" month, so post = 1 starting from September 2015.

Stata Stata Tip
Use gen post = (date >= mdy(9, 1, 2015)) to create a dummy variable that equals 1 when the condition is true.

The mdy() function creates a Stata date from month, day, and year.
           | Post-treatment (Sept
           |        2015+)
      year |         0          1 |     Total
-----------+----------------------+----------
      2013 |       720          0 |       720 
      2014 |       720          0 |       720 
      2015 |       480        240 |       720 
      2016 |         0        720 |       720 
      2017 |         0        720 |       720 
-----------+----------------------+----------
     Total |     1,920      1,680 |     3,600 
✅ Post-treatment indicator created

Task 1.5: Set Up the Panel Structure¶

Declare the data as panel data using xtset. Because xtset requires a numeric panel identifier, you must first convert the string variable gvkey into a new numeric variable called gvkey_num using encode. Then declare the panel with gvkey_num as the firm identifier and date (monthly) as the time variable.

Stata Stata Tip
  • encode gvkey, gen(gvkey_num) — Convert the string firm ID to a numeric variable required by xtset
  • xtset gvkey_num date, monthly — Declare the panel structure (firm × month)
  • xtdescribe — Examine panel balance: are all firms observed in all periods?
warning: Variable date had been formatted %td (a daily period), and you asked
         for a monthly period. Are you sure that is what you want? Format has
         been changed. date is now formatted %tm.

Panel variable: gvkey_num (strongly balanced)
 Time variable: date, 3575m10 to 3725m5, but with gaps
         Delta: 1 month

gvkey_num:  1, 2, ..., 60                                    n =         60
    date:  3575m10, 3578m2, ..., 3725m5                      T =         60
           Delta(date) = 1 month
           Span(date)  = 1796 periods
           (gvkey_num*date uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                        60      60      60        60        60      60      60

     Freq.  Percent    Cum. |  Pattern*
 ---------------------------+--------------------------------------------------
> -----------------
       60    100.00  100.00 |  111111111111.111111111111.11111111111.1111111111
> 11.11111111111.11
 ---------------------------+--------------------------------------------------
> -----------------
       60    100.00         |  XXXXXXXXXXXX.XXXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXX
> XX.XXXXXXXXXXX.XX
 ------------------------------------------------------------------------------
> -----------------
 *Each column represents 28 periods.

✅ Panel structure declared

Task 1.6: Visualize VW's RepRisk Index Over Time¶

Plot the current_RRI over time for Volkswagen (gvkey == "100737") to see the scandal's effect visually. Add a vertical reference line at the event date (September 18, 2015). Save the plot as vw_rri_timeseries.png in the figures folder.

Stata Stata Tip
  • twoway line varname timevar if condition — Line plot for a subset of observations
  • xline(`=mdy(9,18,2015)') — Vertical reference line at a specific date
  • graph export "$figures/vw_rri_timeseries.png", replace — Save the graph to the figures folder
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig
> nments/04-assignment/output/figures/vw_rri_timeseries.png written in PNG form
> at
✅ VW RepRisk plot saved
No description has been provided for this image

Do not delete the code cell below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 1 tests passed

Section 2: Summary Statistics by Group-Time¶

The essence of DiD is comparing changes over time between treatment and control groups. In this section, we'll create a 2×2 table of means and manually calculate the DiD estimator.

Task 2.1: Create 2×2 Table of Means¶

Calculate the mean current_RRI for each combination of german (treatment) and post (time period). This gives us the four cell means needed for the DiD calculation.

Stata Stata Tip
Use table rowvar colvar, stat(mean outcome) to create a table of means by group.
------------------------------------------------------------------
                                |    Post-treatment (Sept 2015+)  
                                |          0          1      Total
--------------------------------+---------------------------------
german                          |                                 
  0                             |                                 
    Mean                        |      14.56      18.18      16.25
    Standard deviation          |      18.12      19.06      18.65
    Number of nonmissing values |    1824.00    1596.00    3420.00
  1                             |                                 
    Mean                        |      52.70      61.12      56.63
    Standard deviation          |      11.15       9.01      11.02
    Number of nonmissing values |      96.00      84.00     180.00
  Total                         |                                 
    Mean                        |      16.47      20.33      18.27
    Standard deviation          |      19.67      20.90      20.34
    Number of nonmissing values |    1920.00    1680.00    3600.00
------------------------------------------------------------------

Task 2.2: Calculate DiD "By Hand"¶

Calculate the difference-in-differences estimator manually using the four cell means:

$$\tau_{DiD} = (\bar{Y}_{treat,post} - \bar{Y}_{treat,pre}) - (\bar{Y}_{control,post} - \bar{Y}_{control,pre})$$

Use summarize four times (once per cell) to obtain each mean, storing each in a local macro. Then compute the DiD as the difference of differences. Finally, store the result in a global macro called did_manual — this makes it available to later sections for comparison with the regression estimate.

Stata Stata Tip
  • quietly summarize current_RRI if german == 1 & post == 1 then local treat_post = r(mean) — repeat for each of the four group-time cells
  • local did = (`treat_post' - `treat_pre') - (`control_post' - `control_pre') — compute the DiD
  • global did_manual = `did' — store as a global so Section 4 can compare it to the regression coefficient

==== Difference-in-Differences Calculation ====


German firms (treated):
 Post:  61.12
 Pre:  52.70
 Diff:   8.42


Non-German firms (control):
 Post:  18.18
 Pre:  14.56
 Diff:   3.62


DiD Estimate:   4.80
==============================================
✅ DiD calculated manually

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 2 tests passed
 Manual DiD estimate:   4.80

Section 3: Parallel Trends Visualization¶

The key identifying assumption for DiD is parallel trends: in the absence of treatment, treated and control groups would have followed similar trajectories. While we cannot directly test this (it's about a counterfactual), we can visually inspect whether the groups moved together before the treatment.

Task 3.1: Create the Parallel Trends Plot¶

Collapse the data to the mean current_RRI for each german × date cell, then create a two-line time series plot showing the average RRI trajectory for German firms (treated) and non-German firms (control) over time. Use preserve and restore to avoid overwriting the firm-level panel data.

The plot should include:

  • One line per group with a legend labeling them "German (treated)" and "Non-German (control)"
  • A vertical reference line at the event date (September 18, 2015)

Save the plot as parallel_trends.png in the figures folder.

Stata Stata Tip
  • preserve / restore — Save and recover the original dataset around the collapse
  • collapse (mean) mean_rri = current_RRI, by(german date) — Reduce to group-time means
  • twoway (line mean_rri date if german==1) (line mean_rri date if german==0) — Overlay two lines
  • legend(order(1 "German (treated)" 2 "Non-German (control)")) — Label the two lines
  • xline(`=mdy(9,18,2015)') — Add a vertical line at the event date
(Note: Below code run with echo to enable preserve/restore functionality.)

. preserve

. collapse (mean) mean_rri = current_RRI, by(german date)

. twoway (line mean_rri date if german == 1, lcolor(purple) lwidth(medium)) (li
> ne mean_rri date if german == 0, lcolor(gray) lwidth(medium)), title("Paralle
> l Trends: RepRisk Index by Treatment Group", size(medium)) xtitle("Date") yti
> tle("Mean RepRisk Index (RRI)") xline(`=mdy(9,18,2015)', lcolor(red) lpattern
> (dash)) legend(order(1 "German (treated)" 2 "Non-German (control)") rows(1) p
> osition(6)) note("Red dashed line: Dieselgate announcement (Sept 18, 2015)") 
> name(parallel_trends, replace)

. graph export "$figures/parallel_trends.png", replace width(1200)
(file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assi
> gnments/04-assignment/output/figures/parallel_trends.png not found)
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig
> nments/04-assignment/output/figures/parallel_trends.png written in PNG format

. display "✅ Parallel trends plot saved"
✅ Parallel trends plot saved

. restore

. 
No description has been provided for this image

Do not delete the code cell below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 3 tests passed

Section 4: Canonical DiD Regression¶

Now let's estimate the canonical DiD regression. The model includes the treatment indicator (german), the post-treatment indicator (post), and their interaction (german × post). The coefficient on the interaction is the DiD estimate.

ℹ️ Background: Canonical DiD Regression
The canonical DiD regression model is:

Yit = β₀ + β₁ × Germani + β₂ × Postt + β₃ × (Germani × Postt) + εit

Where:
  • β₀: Mean outcome for control group in pre-period
  • β₁: Baseline difference between treatment and control groups (pre-period)
  • β₂: Time trend common to both groups
  • β₃: The DiD estimate — differential change for treated group

Task 4.1: Create the Interaction Term¶

Create a new variable treated_post that equals german × post. This is the DiD interaction term.

Stata Stata Tip
Use gen treated_post = german * post to create the interaction term.
    Means, Standard Deviations and Frequencies of German × Post (DiD)

           |   Post-treatment
           |    (Sept 2015+)
    german |         0          1 |     Total
-----------+----------------------+----------
         0 |         0          0 |         0
           |         0          0 |         0
           |      1824       1596 |      3420
-----------+----------------------+----------
         1 |         0          1 | .46666667
           |         0          0 | .50027925
           |        96         84 |       180
-----------+----------------------+----------
     Total |         0        .05 | .02333333
           |         0  .21800984 | .15098086
           |      1920       1680 |      3600
✅ Interaction term created

Task 4.2: Estimate the Canonical DiD Regression¶

Estimate the regression: current_RRI = β₀ + β₁×german + β₂×post + β₃×treated_post + ε

Use robust standard errors by adding the , r option. Store the estimates as did1 for later comparison with richer specifications.

Questions to consider:

  • What is the coefficient on treated_post? Is it statistically significant?
  • Does it match your manual DiD calculation from Section 2?
  • What does this coefficient mean economically (in terms of RepRisk Index points)?
Stata Stata Tip
  • regress outcome var1 var2 var3, r — OLS regression with robust standard errors (r is shorthand for robust)
  • estimates store did1 — Save the regression results under the name did1 so they can be retrieved and tabulated later
Linear regression                               Number of obs     =      3,600
                                                F(3, 3596)        =     900.76
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1968
                                                Root MSE          =     18.238

------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      german |   38.13322   1.209446    31.53   0.000     35.76195    40.50449
        post |   3.619518   .6385266     5.67   0.000     2.367607    4.871428
treated_post |   4.801613   1.626815     2.95   0.003     1.612041    7.991185
       _cons |   14.56469   .4243336    34.32   0.000     13.73273    15.39665
------------------------------------------------------------------------------


Manual DiD estimate:   4.80
Regression DiD estimate:   4.80
✅ Canonical DiD regression estimated

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 4 tests passed
 DiD coefficient:   4.80

Section 5: Add Control Variables¶

Adding time-varying control variables can improve precision and address potential omitted variable bias. However, controls must be "good" controls — variables that affect the outcome but are not themselves affected by treatment.

Task 5.1: DiD Regression with Controls¶

Add the following control variables to the regression: log_sale (firm size), leverage, and roa (profitability). Keep the same three DiD variables (german, post, treated_post) and use robust standard errors.

Store the estimates as did2.

Question: Did the DiD coefficient change much after adding controls? What does this tell us?

Stata Stata Tip
regress current_RRI german post treated_post log_sale leverage roa, r

Then store the results with estimates store did2.
⚠️ Important
Be cautious about which controls to include. "Bad controls" — variables that are themselves affected by treatment — can introduce bias. For example, if Dieselgate affected a firm's profitability, controlling for ROA could absorb part of the treatment effect.
Linear regression                               Number of obs     =      3,600
                                                F(6, 3593)        =     872.81
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4507
                                                Root MSE          =     15.089

------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      german |   35.78826     1.1932    29.99   0.000     33.44885    38.12768
        post |   3.505634   .5317459     6.59   0.000      2.46308    4.548188
treated_post |   4.307353   1.510551     2.85   0.004      1.34573    7.268977
    log_sale |   3.598123   .0934028    38.52   0.000     3.414995    3.781251
    leverage |   2.400373   1.128868     2.13   0.034     .1870874    4.613659
         roa |  -33.82314   2.659785   -12.72   0.000    -39.03798    -28.6083
       _cons |  -24.07966   1.089107   -22.11   0.000    -26.21499   -21.94433
------------------------------------------------------------------------------
✅ DiD with controls estimated

Task 5.2: Compare Models¶

Compare the two DiD models did1 and did2 (with and without controls) using estimates table.

----------------------------------------------
    Variable |     did1            did2       
-------------+--------------------------------
      german |  38.133224***    35.788262***  
        post |  3.6195175***    3.5056342***  
treated_post |  4.8016134**     4.3073531**   
    log_sale |                  3.5981234***  
    leverage |                  2.4003733*    
         roa |                 -33.823141***  
       _cons |  14.564693***   -24.079659***  
-------------+--------------------------------
           N |       3600            3600     
          r2 |  .19678931       .45069059     
----------------------------------------------
      Legend: * p<0.05; ** p<0.01; *** p<0.001

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 5 tests passed
 DiD coefficient (with controls):   4.31

Section 6: Two-Way Fixed Effects¶

The Two-Way Fixed Effects (TWFE) model is a generalization of DiD that includes firm fixed effects and time fixed effects. This absorbs all time-invariant firm characteristics and all firm-invariant time trends.

ℹ️ Background: Two-Way Fixed Effects
The TWFE model is:

Yit = αi + γt + β × (Germani × Postt) + εit

Where:
  • αi: Firm fixed effects — absorb all time-invariant firm characteristics (including german)
  • γt: Time fixed effects — absorb all firm-invariant time trends (including post)
  • β: The DiD estimate — identified only from the interaction term

Key insight: With firm and time fixed effects, the german and post main effects are absorbed (collinear with the fixed effects), so we only estimate the interaction term.

Task 6.1: Estimate TWFE Model¶

Use reghdfe to estimate the two-way fixed effects model. Absorb firm fixed effects (gvkey_num) and time fixed effects (year#month). Use robust standard errors.

Store the estimates as did3.

Stata Stata Tip
The reghdfe command efficiently estimates models with high-dimensional fixed effects:

reghdfe current_RRI treated_post, absorb(gvkey_num year#month) vce(robust)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      3,600
Absorbing 2 HDFE groups                           F(   1,   3480) =      15.60
                                                  Prob > F        =     0.0001
                                                  R-squared       =     0.8258
                                                  Adj R-squared   =     0.8199
                                                  Within R-sq.    =     0.0038
                                                  Root MSE        =     8.6335

------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
treated_post |   4.801613   1.215504     3.95   0.000      2.41844    7.184787
       _cons |   18.16046   .1478644   122.82   0.000     17.87055    18.45037
------------------------------------------------------------------------------

Absorbed degrees of freedom:
------------------------------------------------------+
  Absorbed FE | Categories  - Redundant  = Num. Coefs |
--------------+---------------------------------------|
    gvkey_num |        60           0          60     |
   year#month |        60           1          59     |
------------------------------------------------------+
✅ TWFE model estimated
Note: german and post are absorbed by firm and time fixed effects

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 6 tests passed
 TWFE DiD coefficient:   4.80

Section 7: Clustered Standard Errors¶

With panel data, observations within the same firm are likely correlated over time. Clustering standard errors at the firm level accounts for this serial correlation.

⚠️ Important: Few Clusters Problem
Cluster-robust standard errors perform well asymptotically (many clusters), but can be unreliable with few clusters. In our case:
  • Only 3 German firms (VW, BMW, Mercedes-Benz) in the treated group
  • Around 57 control firms
  • Total of ~60 clusters

With few treated clusters, clustered standard errors may be too small, leading to over-rejection. Keep this limitation in mind when interpreting results.

Task 7.1: TWFE with Clustered Standard Errors¶

Re-estimate the TWFE model — same specification as Task 6.1 (absorb(gvkey_num year#month)) — but replace vce(robust) with standard errors clustered at the firm level (vce(cluster gvkey_num)).

Store the estimates as did4.

Stata Stata Tip
reghdfe current_RRI treated_post, absorb(gvkey_num year#month) vce(cluster gvkey_num)
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      3,600
Absorbing 2 HDFE groups                           F(   1,     59) =       3.79
Statistics robust to heteroskedasticity           Prob > F        =     0.0563
                                                  R-squared       =     0.8258
                                                  Adj R-squared   =     0.8199
                                                  Within R-sq.    =     0.0038
Number of clusters (gvkey_num) =         60       Root MSE        =     8.6335

                             (Std. err. adjusted for 60 clusters in gvkey_num)
------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
treated_post |   4.801613   2.466501     1.95   0.056    -.1338435     9.73707
       _cons |   18.16046   .0575517   315.55   0.000      18.0453    18.27562
------------------------------------------------------------------------------

Absorbed degrees of freedom:
------------------------------------------------------+
  Absorbed FE | Categories  - Redundant  = Num. Coefs |
--------------+---------------------------------------|
    gvkey_num |        60          60           0    *|
   year#month |        60           1          59     |
------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
✅ TWFE with clustered SEs estimated

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 7 tests passed

Section 8: Event Study Specification¶

An event study specification allows us to examine the dynamics of the treatment effect over time. Instead of a single post indicator, we interact treatment with time-relative-to-event dummies. This helps visualize whether:

  1. The parallel trends assumption holds in the pre-period (coefficients should be near zero)
  2. The treatment effect is immediate or gradual

Task 8.1: Create Relative Time Variable¶

Create a variable rel_time that measures the number of months relative to the event (September 2015 = 0, the first post-treatment month). Negative values are pre-treatment, positive values are post-treatment.

Stata Stata Tip
Use mofd(date) to convert a date to months since Jan 1960, then subtract the event month.
     Months |
relative to |
event (Sept |
  2015 = 0) |      Freq.     Percent        Cum.
------------+-----------------------------------
        -32 |         60        1.67        1.67
        -31 |         60        1.67        3.33
        -30 |         60        1.67        5.00
        -29 |         60        1.67        6.67
        -28 |         60        1.67        8.33
        -27 |         60        1.67       10.00
        -26 |         60        1.67       11.67
        -25 |         60        1.67       13.33
        -24 |         60        1.67       15.00
        -23 |         60        1.67       16.67
        -22 |         60        1.67       18.33
        -21 |         60        1.67       20.00
        -20 |         60        1.67       21.67
        -19 |         60        1.67       23.33
        -18 |         60        1.67       25.00
        -17 |         60        1.67       26.67
        -16 |         60        1.67       28.33
        -15 |         60        1.67       30.00
        -14 |         60        1.67       31.67
        -13 |         60        1.67       33.33
        -12 |         60        1.67       35.00
        -11 |         60        1.67       36.67
        -10 |         60        1.67       38.33
         -9 |         60        1.67       40.00
         -8 |         60        1.67       41.67
         -7 |         60        1.67       43.33
         -6 |         60        1.67       45.00
         -5 |         60        1.67       46.67
         -4 |         60        1.67       48.33
         -3 |         60        1.67       50.00
         -2 |         60        1.67       51.67
         -1 |         60        1.67       53.33
          0 |         60        1.67       55.00
          1 |         60        1.67       56.67
          2 |         60        1.67       58.33
          3 |         60        1.67       60.00
          4 |         60        1.67       61.67
          5 |         60        1.67       63.33
          6 |         60        1.67       65.00
          7 |         60        1.67       66.67
          8 |         60        1.67       68.33
          9 |         60        1.67       70.00
         10 |         60        1.67       71.67
         11 |         60        1.67       73.33
         12 |         60        1.67       75.00
         13 |         60        1.67       76.67
         14 |         60        1.67       78.33
         15 |         60        1.67       80.00
         16 |         60        1.67       81.67
         17 |         60        1.67       83.33
         18 |         60        1.67       85.00
         19 |         60        1.67       86.67
         20 |         60        1.67       88.33
         21 |         60        1.67       90.00
         22 |         60        1.67       91.67
         23 |         60        1.67       93.33
         24 |         60        1.67       95.00
         25 |         60        1.67       96.67
         26 |         60        1.67       98.33
         27 |         60        1.67      100.00
------------+-----------------------------------
      Total |      3,600      100.00
✅ Relative time variable created

Task 8.2: Estimate Event Study Specification¶

Estimate an event study regression that interacts the german indicator with relative-time dummies. Because a window of ±36 months produces many sparse endpoint categories, first bin rel_time at ±12 months (so observations outside the window are treated as if they were exactly at the boundary). This gives rel_time_binned.

Stata's factor-variable notation (i.) requires non-negative integers, but rel_time_binned runs from −12 to +12. Shift it by 13 to create rel_time_shifted (where the reference period rel_time = −1 maps to rel_time_shifted = 12).

Use rel_time = −1 (August 2015, the last pre-treatment month) as the reference period, and cluster standard errors at the firm level.

Stata Stata Tip
  • Bin: gen rel_time_binned = rel_time, then replace rel_time_binned = -12 if rel_time < -12 (and = 12 for the upper end)
  • Shift: gen rel_time_shifted = rel_time_binned + 13 — maps rel_time −1 → shifted 12 (the reference)
  • Regression: reghdfe current_RRI c.german#ib(12).rel_time_shifted, absorb(gvkey_num date) vce(cluster gvkey_num)
(1,200 real changes made)
(900 real changes made)
(MWFE estimator converged in 2 iterations)
note: 25.rel_time_shifted#c.german omitted because of collinearity

HDFE Linear regression                            Number of obs   =      3,600
Absorbing 2 HDFE groups                           F(  24,     59) =     171.82
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8272
                                                  Adj R-squared   =     0.8201
                                                  Within R-sq.    =     0.0116
Number of clusters (gvkey_num) =         60       Root MSE        =     8.6283

                             (Std. err. adjusted for 60 clusters in gvkey_num)
------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
rel_time_s~d#|
    c.german |
          1  |  -5.155388   1.474523    -3.50   0.001    -8.105902   -2.204875
          2  |    2.77193   1.981815     1.40   0.167    -1.193673    6.737532
          3  |   1.438596   2.310226     0.62   0.536    -3.184155    6.061348
          4  |    4.45614   2.945462     1.51   0.136    -1.437716       10.35
          5  |  -.4385965    1.41894    -0.31   0.758    -3.277889    2.400696
          6  |  -7.403509   3.497139    -2.12   0.038    -14.40127   -.4057492
          7  |  -2.701754    3.29291    -0.82   0.415    -9.290853    3.887344
          8  |   1.175439    1.57676     0.75   0.459     -1.97965    4.330527
          9  |  -3.912281   3.766425    -1.04   0.303    -11.44888    3.624318
         10  |  -13.26316   5.086301    -2.61   0.012    -23.44082   -3.085493
         11  |  -12.19298   5.476883    -2.23   0.030     -23.1522   -1.233765
         12  |  -5.052632   1.735336    -2.91   0.005    -8.525032   -1.580232
         13  |   7.912281     2.3308     3.39   0.001      3.24836     12.5762
         14  |   6.035088   3.135369     1.92   0.059    -.2387705    12.30895
         15  |   7.298246   2.743347     2.66   0.010     1.808821    12.78767
         16  |   6.719298   3.066548     2.19   0.032     .5831491    12.85545
         17  |    3.54386   3.496853     1.01   0.315    -3.453327    10.54105
         18  |  -.0175439   4.239059    -0.00   0.997    -8.499881    8.464794
         19  |    .754386   5.264943     0.14   0.887     -9.78074    11.28951
         20  |  -2.122807   5.360729    -0.40   0.694     -12.8496    8.603987
         21  |  -7.842105   8.306333    -0.94   0.349    -24.46304    8.778829
         22  |  -6.052632   5.429728    -1.11   0.269    -16.91749     4.81223
         23  |  -3.842105   2.959429    -1.30   0.199    -9.763908    2.079698
         24  |  -3.403509   4.546532    -0.75   0.457     -12.5011    5.694081
         25  |          0  (omitted)
             |
       _cons |    18.3845   .0352392   521.71   0.000     18.31399    18.45502
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   gvkey_num |        60          60           0    *|
        date |        60           1          59     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Task 8.3: Plot the Event Study Coefficients¶

Extract the interaction coefficients and standard errors from the regression above and plot them as a connected scatter plot with 95% confidence interval bars. Save the plot as event_study_plot.png in the figures folder.

The plot should include:

  • A horizontal zero line (no effect baseline)
  • A vertical reference line between t = −1 and t = 0 to mark the start of the post-treatment period
Stata Stata Tip: Manual Coefficient Extraction
After the regression, use preserve and create variables to hold the plot data:
  • Loop over forvalues i = 1/25, assigning _b[c.german#`i'.rel_time_shifted] and the corresponding _se[] into coef and se variables
  • Set the reference period manually: replace coef = 0 if rel_time_shifted == 12
  • Build 95% CIs: gen ci_top = coef + 1.96*se and ci_bottom
  • Collapse to one row per period with duplicates drop, then plot with twoway (scatter coef rel_time_binned, connect(line)) (rcap ci_top ci_bottom rel_time_binned)
Finally, call restore to recover the full dataset.
(Note: Below code run with echo to enable preserve/restore functionality.)

. preserve

. g coef = .
(3,600 missing values generated)

. g se = .
(3,600 missing values generated)

. forvalues i = 1/25 {
  2.     if `i' != 12 {
  3.         replace coef = _b[c.german#`i'.rel_time_shifted] if rel_time_shift
> ed == `i'
  4.         replace se = _se[c.german#`i'.rel_time_shifted] if rel_time_shifte
> d == `i'
  5.     }
  6. }
(1,260 real changes made)
(1,260 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(60 real changes made)
(960 real changes made)
(960 real changes made)

. replace coef = 0 if rel_time_shifted == 12
(60 real changes made)

. replace se = 0 if rel_time_shifted == 12
(60 real changes made)

. g ci_top = coef + 1.96*se

. g ci_bottom = coef - 1.96*se

. keep rel_time_shifted rel_time_binned coef se ci_*

. duplicates drop

Duplicates in terms of all variables

(3,575 observations deleted)

. twoway (scatter coef rel_time_binned, connect(line) mcolor(navy) lcolor(navy)
> ) (rcap ci_top ci_bottom rel_time_binned, lcolor(navy%50)) (function y = 0, r
> ange(-12 12) lcolor(red) lpattern(dash)), xline(-0.5, lcolor(gray) lpattern(d
> ash)) xtitle("Months Relative to Event (Sep 2015 = 0)") ytitle("Coefficient (
> DiD Effect)") title("Event Study: Effect of Dieselgate on German Automakers' 
> RRI") legend(off) caption("Reference period: t = -1 (August 2015). 95% CIs sh
> own." "Dashed vertical line marks first post-treatment month.") name(event_st
> udy, replace)

. graph export "$figures/event_study_plot.png", replace width(1200)
(file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assi
> gnments/04-assignment/output/figures/event_study_plot.png not found)
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig
> nments/04-assignment/output/figures/event_study_plot.png written in PNG forma
> t

. restore

. display "✅ Event study plot saved"
✅ Event study plot saved

. 
No description has been provided for this image

Do not delete the code cells below. They contain hidden tests that will check your work after you complete the tasks.

✅ Section 8 tests passed

Section 9: Regression Table¶

Create a publication-quality regression table showing the progression of specifications.

Task 9.1: Export Regression Table¶

Use esttab to export all four DiD models (did1, did2, did3, did4) to a LaTeX file called did_regression_table.tex in the tables folder. The table should show the progression from canonical DiD to TWFE with clustered standard errors.

Stata Stata Tip: esttab Options
Key options:
  • se — Show standard errors
  • star(* 0.10 ** 0.05 *** 0.01) — Significance stars
  • stats(N r2) — Include N and R²
  • mtitles() — Column titles
(file
    /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assi
    > gnments/04-assignment/output/tables/did_regression_table.tex not found)
(output written to /Users/casparm4/Github/rsm-data-analytics-in-finance-private
> /private/assignments/04-assignment/output/tables/did_regression_table.tex)
✅ Regression table exported: did_regression_table.tex

Do not delete the code cell below. It contains hidden tests that will check your work after you complete the tasks.

✅ Section 9 tests passed

Section 10: Critical Evaluation (Ungraded)¶

No research design is perfect. In this section, reflect on the limitations of our DiD analysis and what would strengthen our causal claims.

Task 10.1: Discuss Limitations¶

In the markdown cell below, discuss at least three limitations of this DiD analysis. Consider:

  1. Parallel trends assumption: Can we confidently say that German and non-German automakers would have followed similar RRI trends absent Dieselgate?

  2. Small number of treated units: We have only 3 German firms in the treated group. What are the implications for inference?

  3. Other threats to identification: Are there other events around September 2015 that might affect German automakers differently? (e.g., refugee crisis, other policy changes)

  4. Generalizability: Can we generalize findings from the auto industry to other contexts?

  5. Measurement: Is RepRisk a good measure of the "true" ESG risk impact of Dieselgate?

LIMITATIONS:

  1. Small number of treated units: With only 3 German automakers (Volkswagen, BMW, Mercedes-Benz) in the treated group, our statistical power is limited. Standard asymptotic inference (including clustered standard errors) may be unreliable with so few clusters. A single outlier firm could substantially influence the results. More robust methods like randomization inference or wild cluster bootstrap might be appropriate.

  2. Parallel trends assumption: While the parallel trends plot suggests reasonably similar pre-trends, we cannot formally test the assumption (it concerns a counterfactual). The pre-period shows some divergence between groups, and the control group (non-German automakers) is quite heterogeneous — including Japanese, French, Italian, and Korean firms that may face different industry dynamics.

  3. Potential confounders: Other events occurred around September 2015 that might differentially affect German firms:

    • The European refugee crisis peaked in late 2015, with Germany taking a prominent role
    • ECB monetary policy changes
    • Other regulatory developments in the automotive sector These confounders could bias our estimate if they affected German automakers' ESG profiles differently from control firms.
  4. Spillover effects: Our identification assumes no spillover from treated to control firms. However, if Dieselgate increased scrutiny of all automakers' emissions practices, control firms might also have experienced RRI changes — biasing our DiD estimate toward zero.


Section 11: Extension — Spillover Analysis (Ungraded)¶

🚀 Above and Beyond
This section is optional and ungraded. It explores a more nuanced question: Was the reputational damage concentrated at Volkswagen (the scandal firm), or did it spill over to other German automakers (BMW, Mercedes-Benz)?

This analysis helps distinguish between direct effects (VW was caught cheating) and contagion/spillover effects (guilt by national association).

Extension 11.1: Create Treatment Subgroups¶

Create two new treatment indicators:

  • vw: 1 for Volkswagen only (gvkey = "100737")
  • other_german: 1 for other German firms (BMW, Mercedes-Benz)
                      |      Volkswagen
                 conm |         0          1 |     Total
----------------------+----------------------+----------
BAYERISCHE MOTOREN .. |        60          0 |        60 
MERCEDES BENZ GROUP.. |        60          0 |        60 
        VOLKSWAGEN AG |         0         60 |        60 
----------------------+----------------------+----------
                Total |       120         60 |       180 

                      |  Other German (BMW,
                      |       Mercedes)
                 conm |         0          1 |     Total
----------------------+----------------------+----------
BAYERISCHE MOTOREN .. |         0         60 |        60 
MERCEDES BENZ GROUP.. |         0         60 |        60 
        VOLKSWAGEN AG |        60          0 |        60 
----------------------+----------------------+----------
                Total |        60        120 |       180 
✅ Treatment subgroups created

Extension 11.2: Separate DiD for VW vs Other German Firms¶

Run two separate DiD regressions:

  1. VW vs. non-German control firms
  2. Other German firms vs. non-German control firms

Compare the magnitudes. Is the effect larger for VW (the scandal firm)?

(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      3,480
Absorbing 2 HDFE groups                           F(   1,     57) =      99.19
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.8100
                                                  Adj R-squared   =     0.8034
                                                  Within R-sq.    =     0.0056
Number of clusters (gvkey_num) =         58       Root MSE        =     8.6291

                             (Std. err. adjusted for 58 clusters in gvkey_num)
------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     vw_post |   9.817982   .9857921     9.96   0.000     7.843968      11.792
       _cons |   16.96066   .0079317  2138.35   0.000     16.94478    16.97654
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   gvkey_num |        58          58           0    *|
        date |        60           1          59     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
(MWFE estimator converged in 2 iterations)

HDFE Linear regression                            Number of obs   =      3,540
Absorbing 2 HDFE groups                           F(   1,     58) =       1.94
Statistics robust to heteroskedasticity           Prob > F        =     0.1695
                                                  R-squared       =     0.8125
                                                  Adj R-squared   =     0.8060
                                                  Within R-sq.    =     0.0006
Number of clusters (gvkey_num) =         59       Root MSE        =     8.6570

                             (Std. err. adjusted for 59 clusters in gvkey_num)
------------------------------------------------------------------------------
             |               Robust
 current_RRI | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
other_germ~t |   2.293429   1.648419     1.39   0.169    -1.006241    5.593099
       _cons |    17.4979   .0260767   671.02   0.000      17.4457     17.5501
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
   gvkey_num |        59          59           0    *|
        date |        60           1          59     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation


==== Spillover Analysis Results ====



----------------------------------------------
    Variable |    vw_only      other_germa~y  
-------------+--------------------------------
     vw_post |  9.8179825***                  
other_germ~t |                  2.2934289     
       _cons |   16.96066***    17.497901***  
-------------+--------------------------------
           N |       3480            3540     
          r2 |    .809981       .81249319     
----------------------------------------------
      Legend: * p<0.05; ** p<0.01; *** p<0.001


Interpretation:
- VW coefficient: Direct effect on scandal firm
- Other German coefficient: Spillover/contagion effect on peers

Extension 11.3: Interpretation¶

⚠️ Caveat: Very Small Sample
With only 1 firm in the "VW" treatment group and 2 firms in the "Other German" group, these estimates are essentially case studies rather than statistical inference. Interpret with extreme caution — the estimates are highly sensitive to firm-specific idiosyncrasies.

If VW shows a larger effect than other German firms, this suggests the reputational damage was concentrated at the scandal firm. If both show similar effects, it suggests broader contagion to the "German automotive" brand.


References¶

  • The Effect (Free Online Textbook): Huntington-Klein, N. (2021). The Effect: An Introduction to Research Design and Causality
  • RepRisk Methodology: RepRisk Research Methodology
  • Dieselgate Timeline: EPA Notice of Violation (September 18, 2015) - EPA Press Release

Data Analytics for Finance

BM17FI · Academic Year 2025–26

Erasmus University Rotterdam

Created by: Caspar David Peter

© 2026 Rotterdam School of Management