De Do Do Do, De Di i Di - Panel Data Methods

Data Analytics for Finance

Caspar David Peter

Rotterdam School of Management, Accounting Department

Panel Data Methods

Before We Start…

Assignment Release and Submission Schedule

Assignment Release Deadline
A1 11.02.2026 25.02.2026
A2 11.02.2026 25.02.2026
A3 18.02.2026 04.03.2026
A4 25.02.2026 11.03.2026
A5 04.03.2026 TBA
A6 11.03.2026 TBA

Important Note on Assignment Deadlines

Deadlines for A5-A6 will be announced in the coming weeks. Please stay tuned for updates and check Canvas regularly if you do not attend lectures.

Overview

Recap of previous lecture

Overview

Today’s learning objectives

  • What are good and bad controls in regression analysis?
  • Understand the Difference-in-Differences (DiD) framework for causal inference using panel data
  • Learn how to implement DiD design, including fixed effects
  • Recognize the importance of appropriate standard error adjustments in DiD analyses
  • Explore advanced DiD topics such as staggered adoption and recent methodological developments

What to do if random assignment is not possible?

We did not answer this question in the previous lecture!

What to do if random assignment is not possible?

Building the OLS model

From Huntington-Klein (2022) chapter 13

Let’s build the OLS model for DiLLMa

  • Outcome variable: exam_score
  • Treatment variable: treated (1 if student uses LLM, 0 otherwise)

What to do if random assignment is not possible?

Building the OLS model

From Huntington-Klein (2022) chapter 13

Let’s build the OLS model for DiLLMa

  • Open paths between treated and exam_score:
    • ability (confounder)
    • study_hours (mediator)
    • attendance_rate (mediator)
    • age (confounder)
    • gender (confounder)
  • Should effect differ, e.g. for high vs low study hours?
    • Includeinteraction term to test for heterogeneous effects

Control Variables in Regression Analysis

Control Variables in Regression Analysis

Good vs. Bad controls

Good controls

  • Variables that are not caused by the treatment — they can be correlated with both treatment and outcome
  • Closing a “backdoor path” (confounder) reduces omitted variable bias (ability is a good control, but unobservable)
  • Pure outcome predictors (uncorrelated with treatment) improve precision without biasing estimates

Bad controls

  • Variables that lie on the causal path from treatment to outcome (mediators)
  • Variables that are caused by both treatment and outcome (colliders)
  • Adding every available variable is not a safe strategy

Control Variables in Regression Analysis

Good controls - Examples

Confounder - “Common cause”

Lecture 3 example: Student ability

  • Controlling for ability helps isolate the causal effect of LLM use on scores by blocking this backdoor path: LLM use ← ability → exam score
  • Other good controls could include age, gender, etc. — variables that are correlated with both LLM use and exam scores but are not caused by LLM use

Control Variables in Regression Analysis

Bad controls — Collider

Structure: X → Z ← Y

  • Z is caused by both X and Y — two arrows point into Z
  • Conditioning on Z opens a spurious path between X and Y, creating a phantom association that does not exist in the full population

Example: Office hour attendance

  • Some students need to check “hallucinated” or complicated LLM responses during preparation via office hour attendance (X → Z)
  • Other students attend office hours because they are struggling with the material (Y → Z) and expect a lower score

Control Variables in Regression Analysis

Summary

  • Good controls help isolate the causal effect by blocking non-causal paths (confounders)
  • Bad controls can bias estimates by blocking causal paths (mediators) or opening spurious paths (colliders)

“To fix a Confounder, you MUST include it. To avoid a Collider, you MUST exclude it.”

Takeaway: Be deliberate about controls

DiD with two-way fixed effects largely sidesteps the confounder problem by absorbing time-invariant differences between units. But mediators remain a live concern — if you control for a variable that lies on the causal path from your treatment to the outcome, you will underestimate the treatment effect.

Difference-in-Differences (DiD)

Difference-in-Differences (DiD)

Key concepts

What is Difference‑in‑Differences (DID)?

Difference‑in‑Differences (DID) is a quasi‑experimental method that exploits within‑group variation over time and cross‑group variation to identify a causal effect when random assignment is infeasible.

Parallel Trends Assumption

DiD only recovers the causal effect if the “parallel trends assumption” holds!

Difference-in-Differences (DiD)

Why Differences — Isn’t one difference enough?

Before vs After (time variation)

What is it?

Compare outcomes before and after treatment implementation, e.g. pre- and post-policy change

Why not enough for causal inference?

All variation in Treatment is explained by Time!

Treated vs Control (group variation)

What is it?

Compare outcomes between treated and control groups, e.g. those affected by a policy change vs those not affected

Why not enough for causal inference?

Differences between Treated and Control groups may be driven by time-invariant confounders, e.g. ability, demographics, location, etc.

Combining both allows us to isolate the causal impact of the treatment a.k.a. average treatment effect on the treated (ATT)

Difference-in-Differences (DiD)

DiLLMa - Setting continued

  • We observe students’ exam scores before and after LLMs were introduced 1
  • Some courses allowed LLM use (treatment), others banned it (control)
  • Goal: Estimate the causal effect of allowing LLM use on exam scores

In short

Compare grade changes in allowed vs. banned courses, before and after LLMs became available

DiD isolates the treated group’s response, conditional on the assumption that the untreated group’s changes represent the non-treatment counterfactual for the treated group

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

(1) After (2) Before (1) - (2)
(a) Treatment Y\(_{treated,\ after}\) Y\(_{treated,\ before}\) \(\Delta_{treated}\)
(b) Control Y\(_{control,\ after}\) Y\(_{control,\ before}\) \(\Delta_{control}\)
(a) - (b) \(\Delta_{after}\) \(\Delta_{before}\) DiD

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

  • The difference-in-differences regression gives you the same estimate as if you took differences in the group averages

  • It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

(1) After (2) Before (1) - (2)
(a) Treatment \(\beta_0 + \beta_1+\beta_2+\beta_3\) \(\beta_0 + \beta_1\) \(\beta_2+\beta_3\)
(b) Control \(\beta_0 + \beta_2\) \(\beta_0\) \(\beta_2\)
(a) - (b) \(\beta_1+\beta_3\) \(\beta_1\) \(\beta_3\)

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

  • The difference-in-differences regression gives you the same estimate as if you took differences in the group averages

  • It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Example data summary

  • Same variables as in previous lecture, but now panel data with multiple observations per student
  • Treatment assigned at the course level (some instructors allow LLM use, others ban it)
  • Data contains two periods: pre-LLM (before) and post-LLM (after), each consisting of two exam scores per student
  • The unit of analysis is the student-time level
  • The key variables are:
    • treated: Indicator for whether the course allows LLM use (1 = yes, 0 = no)
    • after: Indicator for whether the observation is from the post-LLM period (1 = after, 0 = before)
    • exam_score: The student’s exam score

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Comparing means

After Before Difference
Treated group 6.57 7.12 -0.55
Control group 7.42 7.22 0.20
Difference -0.85 -0.10 -0.75

Estimation

Difference-in-Differences (DiD)

Two-way fixed effects (TWFE)

Canonical: \(Y = \underbrace{\beta_0 + \beta_1 Treated}_{\alpha_i} + \underbrace{\beta_2 After}_{\alpha_t} + \beta_3 Treated \times After + \epsilon\)

TWFE: \(Y = \alpha_i + \alpha_t + \beta_{DiD}\ Treated \times After + \epsilon\)

What is absorbed by fixed effects?
  • \(\beta_0\) is absorbed into the individual fixed effects \(\alpha_i\) — every unit gets its own intercept
  • \(\beta_1\) disappears - time-invariant differences are absorbed by individual fixed effects
  • \(\beta_2\) disappears - time fixed effects capture common shocks over time
Why use fixed effects?
  • Controls for unobserved heterogeneity across individuals and over time
  • Focuses on within-individual variation to identify treatment effects
  • More robust to omitted variable bias from time-invariant confounders

Difference-in-Differences (DiD)

Two-way fixed effects

Estimation with fixed effects

What changes with fixed effects?

  • The DiD estimate remains similar, indicating robustness to fixed effects
  • Controls for unobserved individual heterogeneity and time effects
  • Time-invariant confounders are addressed — ability drops out automatically because it is constant per student and is fully absorbed by the student fixed effect
  • Parallel Trends
  • But what about standard errors?

Difference-in-Differences (DiD)

Two-way fixed effects (TWFE)

Under the Hood: What do fixed effects actually do?

  • Any time-invariant characteristic that cannot be observed in the data \(=\) fixed effect
  • Fixed effects only use within-unit variation over time — all cross-sectional differences between units are swept away. This is why time-invariant variables like ability and female drop out automatically: once you subtract each student’s own mean, a constant value cancels out completely
  • Use with caution! Do not fix effect away what you want to measure (e.g. do not use firm FE if your research question is about cross-sectional variation across firms)

Fixed effects solve endogeneity from time-invariant omitted variables. They do not solve endogeneity from time-varying confounders — if something unobserved changes over time and also affects both treatment and outcome, fixed effects won’t help. This is exactly what the parallel trends assumption guards against in DiD.

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

Why does this matter?

  • How to calculate economic magnitudes, e.g. \[\text{Economic Magnitude} = \beta \times \sigma_X\]
  • If \(\beta\) is identified using only within-group variation, then \(\sigma_X\) should reflect the variation that was actually used for identification — i.e. the within-group standard deviation, not the raw sample standard deviation that includes between-group variation that was purged by the FE
  • The Mismatch: Using raw variation as a counterfactual implies a change that the regression model itself never “saw” or used for identification

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

The Issue of Inflation and Extrapolation

  • Overstated Effects: The sample standard deviation is often two to three times larger than the within-group variation that actually identifies the coefficient.
  • Extrapolation Bias: Characterizing a “one-standard-deviation” increase based on raw data often represents an unrealistically large change that rarely occurs within a single firm or group.
  • Modeling Assumptions: Such large counterfactual changes rely heavily on linearity assumptions that may not be testable or defensible for changes that are significantly larger than common variation in the data.

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

Identifying the “Effective” Variation

  • Singleton & No-Variation Groups: FE can “mask” the effective sample. Observations in groups with only one entry (singletons) or zero variation in \(X\) do not contribute to the coefficient estimate
  • Shrunken Distributions: After residualizing variables to FE, the remaining variation is much more narrowly distributed
  • Distorted Unit of Change: Because FE regressions focus on predicted changes in \(Y\) in response to deviations from a group’s mean, the raw standard deviation is no longer a meaningful unit for characterizing the magnitude of the effect

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

What to do about it?

  • Use Within-Group Scaling: Researchers should characterize economic magnitudes using thewithin-group standard deviation of the independent variable.
  • Report W-S Ratios: Supplement raw descriptive statistics with standard deviations of variables residualized to the FE to show how much variation was purged.
  • Validate Counterfactuals: Report the frequency of the proposed counterfactual move. If a “one-standard-deviation” move happens less than 10% of the time within a group, it may be a “extreme counterfactual” .
  • Within-\(R^2\): Supplement the total \(R^2\) with a “within-\(R^2\)” to show how much power comes from the variables of interest versus the FE themselves.

Difference-in-Differences (DiD)

Standard errors

Why do standard errors matter?

  • Standard errors quantify the uncertainty around our estimates
  • Incorrect standard errors can lead to misleading conclusions about statistical significance

What can go wrong? — Classic issues

  • Heteroskedasticity: If the variance of errors is not constant, standard errors may be biased (lecture 3)
    • Use robust standard errors to account for heteroskedasticity
  • Autocorrelation: In DiD settings, errors may be correlated over time within groups, which can also lead to underestimated standard errors if not addressed
    • Use Newey-West or block bootstrap methods to adjust for serial correlation (in DiD settings)

Difference-in-Differences (DiD)

Standard errors

What can go wrong? — Clustering

  • Clustering: If treatment is assigned at a group level (e.g., course level), outcomes within groups may be correlated, leading to underestimated standard errors if not accounted for (Moulton problem, see Moulton (1990))

When and how to adjust standard errors?

  • Cluster standard errors at the level of treatment assignment (e.g., course level) to account for within-group correlation
  • Use robust standard errors to account for heteroskedasticity
  • Be cautious with a small number of clusters (less than ~30), as standard errors may still be biased; consider using wild cluster bootstrap methods in such cases

Difference-in-Differences (DiD)

Standard errors

Example: DiLLMa

  • Treatment is assigned at the course level, but we have multiple students per course
  • Outcomes of students within the same course are likely correlated (e.g., due to shared instructor, materials, peer effects)
  • If we ignore this clustering and use standard errors that assume independence, we will underestimate the true variability of our estimates
  • This can lead to falsely concluding that the treatment effect is statistically significant when it may not be (Type I error)

Difference-in-Differences (DiD)

Standard errors

Example: DiLLMa with clustering

What changes with clustering?

  • Standard errors increase when clustering by course, reflecting the within-course correlation of outcomes
  • No change in point estimates, but the statistical significance of the treatment effect may change (e.g., from significant to non-significant)

Difference-in-Differences (DiD)

Standard errors

Key takeaways on standard errors in DiD settings

  • Always use robust standard errors (solves heteroskedasticity issues)
  • Rule of thumb I: Cluster at the level of the treatment/``shock’’, e.g. treatment is on state-level (new law) \(\Rightarrow\) cluster standard errors at state
  • Be careful: Less than \(\approx\) 30 clusters makes standard errors too small
  • Rule of thumb II: Check robustness of different cluster methods \(\Rightarrow\) report most conservative one

Difference-in-Differences (DiD)

Takeaways so far

  • DiD leverages both cross-sectional and time-series variation to identify causal effects (ATT)
  • The canonical DiD model can be extended with fixed effects to control for time-invariant heterogeneity
  • The parallel trends assumption is crucial for valid DiD inference
  • Proper standard error adjustments are essential for accurate inference in DiD settings
  • Is that all? Not quite…

Limitations

Limitations arise in rollout (staggered) designs, where treatment timing varies across groups; TWFE can perform poorly in such settings…

Difference-in-Differences (DiD)

Staggered adoption — The problem

The canonical DiD assumes a clean 2×2 world

Before After
Treated
Control

Staggered (rollout) DiD

When units receive treatment at different points in time, we call this a staggered adoption or rollout design. This is extremely common in finance and economics research.

Difference-in-Differences (DiD)

Staggered adoption — Why TWFE can go wrong

The standard Two-Way Fixed Effects (TWFE) estimator implicitly uses all available 2×2 comparisons, including some you probably don’t want:

Goodman-Bacon decomposition

Goodman-Bacon (2021)

The “forbidden comparison” problem

  • Early-treated vs. never-treated ✓ Clean comparison
  • Late-treated vs. never-treated ✓ Clean comparison
  • Early-treated vs. late-treated ⚠️ The already-treated group acts as a “control” — but they’ve already been treated!

Difference-in-Differences (DiD)

Staggered adoption — The negative weights problem info-pic

The negative weights problem

When treatment effects are heterogeneous, TWFE can assign negative weights to some group-time comparisons. This means TWFE can produce a negative estimate even when every single unit has a positive treatment effect. Goodman-Bacon (2021) decomposed TWFE into its component 2×2 DiDs and showed this explicitly.

Why does this happen?

  • Suppose early-treated firms experience a large initial treatment effect that fades over time.
  • TWFE will then use the later period, when the early-treated group’s effect has faded, as a “control” for the late-treated group.
  • TWFE interprets the fading as a negative effect — even though the treatment was always positive.

The problem is not staggered adoption per se — it is staggered adoption combined with treatment effect heterogeneity.

Difference-in-Differences (DiD)

Staggered adoption — Recent solutions (high-level)

New approaches share a common idea: only compare treated units to clean controls

Approach Key idea Reference
Callaway & Sant’Anna Estimate separate ATTs for each (group, time) cell, then aggregate Callaway and Sant’Anna (2021)
Sun & Abraham Interaction-weighted estimator; heterogeneity-robust Sun and Abraham (2021)
Borusyak, Jaravel & Spiess Imputation-based; efficient Borusyak, Jaravel, and Spiess (2024)
Stacked DiD Manually construct clean 2×2 datasets and stack them Cengiz et al. (2019) and Gormley and Matsa (2011)

Difference-in-Differences (DiD)

Staggered adoption — Practical guidance

Rule of thumb

  • If your setting is a clean 2×2 (one treatment date, one treated group): standard TWFE is fine
  • If your setting has staggered rollout: check for and report the Goodman-Bacon decomposition to understand how much your estimate is driven by “forbidden” comparisons
  • If heterogeneous effects are plausible (they usually are): use Callaway-Sant’Anna or Sun-Abraham as robustness

Difference-in-Differences (DiD)

Beyond your MSc replication

New approaches

How to learn about the new approaches?
  • See e.g. Asjad Naqvi’s Github repository that provides code and resources for implementing the new DiD methods or Youtube lectures by the original authors
  • Check the papers by Callaway and Sant’Anna (2021), Sun and Abraham (2021), Borusyak, Jaravel, and Spiess (2024), and Cengiz et al. (2019)
  • Go beyond the literature in this course, e.g. Chaisemartin and D’Haultfœuille (2023)
Why should you care about the new approaches?

A way to extend your MSc replication project is to apply some of the new DiD methods to your data and compare the results to the standard TWFE approach. This can provide insights into the robustness of your findings and demonstrate your ability to apply cutting-edge econometric techniques.

Thank You for Your Attention!

See You in the Next One!

References

Abouk, Rahi, and Scott Adams. 2013. “Texting Bans and Fatal Accidents on Roadways: Do They Work? Or Do Drivers Just React to Announcements of Bans?” American Economic Journal: Applied Economics 5 (2): 179–99.
Baker, Andrew C, David F Larcker, and Charles CY Wang. 2022. “How Much Should We Trust Staggered Difference-in-Differences Estimates?” Journal of Financial Economics 144 (2): 370–95.
Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2024. “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies 91 (6): 3253–85.
Breuer, Matthias, and ED DeHaan. 2024. “Using and Interpreting Fixed Effects Models.” Journal of Accounting Research 62 (4): 1183–1226.
Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.
Cameron, A Colin, and Douglas L Miller. 2015. “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317–72.
Cengiz, Doruk, Arindrajit Dube, Attila Lindner, and Ben Zipperer. 2019. “The Effect of Minimum Wages on Low-Wage Jobs.” The Quarterly Journal of Economics 134 (3): 1405–54.
Chaisemartin, Clément de, and Xavier D’Haultfœuille. 2023. “Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments.” Available at SSRN 4487202.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2024. “A Crash Course in Good and Bad Controls.” Sociological Methods & Research 53 (3): 1071–1104.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.
Gormley, Todd A, and David A Matsa. 2011. “Growing Out of Trouble? Corporate Responses to Liability Risk.” The Review of Financial Studies 24 (8): 2781–821.
Huntington-Klein, Nick. 2022. The Effect: An Introduction to Research Design and Causality. 2nd ed. Chapman; Hall/CRC.
Liu, Zack, and Adam Winegar. 2025. “Economic Magnitudes Within Reason.” Journal of Corporate Finance 83: 102707. https://doi.org/10.1016/j.jcorpfin.2024.102707.
Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” The Review of Economics and Statistics, 334–38.
Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99.
Verbeek, Marno. 2021. Panel Methods for Finance: A Guide to Panel Data Econometrics for Financial Applications. De Gruyter.

Appendix

Difference-in-Differences (DiD)

Unconditional means

Event study approach

The parallel trends assumption states that, in the absence of treatment, the average change in the outcome variable would have been the same for both the treatment and control groups. Back

Difference-in-Differences (DiD)

Bacon decomposition - Infographic

Back

Stata example - If time permits

Stata example

Texting bans and traffic accidents — Background

Cross-sectional variation

Temporal variation

\[Y_{i,t} = \gamma_{State} + \theta_{Month} + \beta_3Ban_{i} \times Post_{t} + \epsilon_{i,t}\]

Stata example

Texting bans and traffic accidents — Main results

Main results

Interpretation

  • 3.7% reduction of accidents in states after passing the ban compared to states without a ban in place
  • Strong ban states: 8.1% reduction in accidents
  • Weak ban states: 7.5% increase in accidents