De Do Do Do, De Di i Di - Panel Data Methods

Data Analytics for Finance

Caspar David Peter

Rotterdam School of Management, Accounting Department

Panel Data Methods

Before We Start…

Assignment Release and Submission Schedule

Assignment	Release	Deadline
A1	11.02.2026	25.02.2026
A2	11.02.2026	25.02.2026
A3	18.02.2026	04.03.2026
A4	25.02.2026	11.03.2026
A5	04.03.2026	TBA
A6	11.03.2026	TBA

Important Note on Assignment Deadlines

Deadlines for A5-A6 will be announced in the coming weeks. Please stay tuned for updates and check Canvas regularly if you do not attend lectures.

Overview

Recap of previous lecture

Overview

Today’s learning objectives

What are good and bad controls in regression analysis?
Understand the Difference-in-Differences (DiD) framework for causal inference using panel data
Learn how to implement DiD design, including fixed effects
Recognize the importance of appropriate standard error adjustments in DiD analyses
Explore advanced DiD topics such as staggered adoption and recent methodological developments

What to do if random assignment is not possible?

We did not answer this question in the previous lecture!

What to do if random assignment is not possible?

Building the OLS model

From Huntington-Klein (2022) chapter 13

Let’s build the OLS model for DiLLMa

Outcome variable: exam_score
Treatment variable: treated (1 if student uses LLM, 0 otherwise)

What to do if random assignment is not possible?

Building the OLS model

From Huntington-Klein (2022) chapter 13

Let’s build the OLS model for DiLLMa

Open paths between treated and exam_score:
- ability (confounder)
- study_hours (mediator)
- attendance_rate (mediator)
- age (confounder)
- gender (confounder)
Should effect differ, e.g. for high vs low study hours?
- Includeinteraction term to test for heterogeneous effects

Control Variables in Regression Analysis

Good vs. Bad controls

Good controls

Variables that are not caused by the treatment — they can be correlated with both treatment and outcome
Closing a “backdoor path” (confounder) reduces omitted variable bias (ability is a good control, but unobservable)
Pure outcome predictors (uncorrelated with treatment) improve precision without biasing estimates

Bad controls

Variables that lie on the causal path from treatment to outcome (mediators)
Variables that are caused by both treatment and outcome (colliders)
Adding every available variable is not a safe strategy

Control Variables in Regression Analysis

Good controls - Examples

Confounder - “Common cause”

Lecture 3 example: Student ability

Controlling for ability helps isolate the causal effect of LLM use on scores by blocking this backdoor path: LLM use ← ability → exam score
Other good controls could include age, gender, etc. — variables that are correlated with both LLM use and exam scores but are not caused by LLM use

Control Variables in Regression Analysis

Bad controls — Collider

Structure: X → Z ← Y

Z is caused by both X and Y — two arrows point into Z
Conditioning on Z opens a spurious path between X and Y, creating a phantom association that does not exist in the full population

Example: Office hour attendance

Some students need to check “hallucinated” or complicated LLM responses during preparation via office hour attendance (X → Z)
Other students attend office hours because they are struggling with the material (Y → Z) and expect a lower score

Control Variables in Regression Analysis

Summary

Good controls help isolate the causal effect by blocking non-causal paths (confounders)
Bad controls can bias estimates by blocking causal paths (mediators) or opening spurious paths (colliders)

“To fix a Confounder, you MUST include it. To avoid a Collider, you MUST exclude it.”

Takeaway: Be deliberate about controls

DiD with two-way fixed effects largely sidesteps the confounder problem by absorbing time-invariant differences between units. But mediators remain a live concern — if you control for a variable that lies on the causal path from your treatment to the outcome, you will underestimate the treatment effect.

Difference-in-Differences (DiD)

Key concepts

What is Difference‑in‑Differences (DID)?

Difference‑in‑Differences (DID) is a quasi‑experimental method that exploits within‑group variation over time and cross‑group variation to identify a causal effect when random assignment is infeasible.

Parallel Trends Assumption

DiD only recovers the causal effect if the “parallel trends assumption” holds!

Difference-in-Differences (DiD)

Why Differences — Isn’t one difference enough?

Before vs After (time variation)

What is it?

Compare outcomes before and after treatment implementation, e.g. pre- and post-policy change

Why not enough for causal inference?

All variation in Treatment is explained by Time!

Treated vs Control (group variation)

What is it?

Compare outcomes between treated and control groups, e.g. those affected by a policy change vs those not affected

Why not enough for causal inference?

Differences between Treated and Control groups may be driven by time-invariant confounders, e.g. ability, demographics, location, etc.

Combining both allows us to isolate the causal impact of the treatment a.k.a. average treatment effect on the treated (ATT)

Difference-in-Differences (DiD)

DiLLMa - Setting continued

We observe students’ exam scores before and after LLMs were introduced ¹
Some courses allowed LLM use (treatment), others banned it (control)
Goal: Estimate the causal effect of allowing LLM use on exam scores

In short

Compare grade changes in allowed vs. banned courses, before and after LLMs became available

DiD isolates the treated group’s response, conditional on the assumption that the untreated group’s changes represent the non-treatment counterfactual for the treated group

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

	(1) After	(2) Before	(1) - (2)
(a) Treatment	Y\(_{treated,\ after}\)	Y\(_{treated,\ before}\)	\(\Delta_{treated}\)
(b) Control	Y\(_{control,\ after}\)	Y\(_{control,\ before}\)	\(\Delta_{control}\)
(a) - (b)	\(\Delta_{after}\)	\(\Delta_{before}\)	DiD

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

The difference-in-differences regression gives you the same estimate as if you took differences in the group averages
It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

The two-by-two set-up

	(1) After	(2) Before	(1) - (2)
(a) Treatment	\(\beta_0 + \beta_1+\beta_2+\beta_3\)	\(\beta_0 + \beta_1\)	\(\beta_2+\beta_3\)
(b) Control	\(\beta_0 + \beta_2\)	\(\beta_0\)	\(\beta_2\)
(a) - (b)	\(\beta_1+\beta_3\)	\(\beta_1\)	\(\beta_3\)

A typical DiD regression looks like this

\[Y = \beta_0 + \beta_1 Treated + \beta_2 After + \beta_3 Treated \times After + \epsilon\]

The difference-in-differences regression gives you the same estimate as if you took differences in the group averages
It takes also care of any unobserved constant differences between subjects and time trends!

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Example data summary

Same variables as in previous lecture, but now panel data with multiple observations per student
Treatment assigned at the course level (some instructors allow LLM use, others ban it)
Data contains two periods: pre-LLM (before) and post-LLM (after), each consisting of two exam scores per student
The unit of analysis is the student-time level
The key variables are:
- treated: Indicator for whether the course allows LLM use (1 = yes, 0 = no)
- after: Indicator for whether the observation is from the post-LLM period (1 = after, 0 = before)
- exam_score: The student’s exam score

Difference-in-Differences (DiD)

Canonical DiD model - The two-by-two design

Comparing means

	After	Before	Difference
Treated group	6.57	7.12	-0.55
Control group	7.42	7.22	0.20
Difference	-0.85	-0.10	-0.75

Estimation

Difference-in-Differences (DiD)

Two-way fixed effects (TWFE)

Canonical: \(Y = \underbrace{\beta_0 + \beta_1 Treated}_{\alpha_i} + \underbrace{\beta_2 After}_{\alpha_t} + \beta_3 Treated \times After + \epsilon\)

TWFE: \(Y = \alpha_i + \alpha_t + \beta_{DiD}\ Treated \times After + \epsilon\)

What is absorbed by fixed effects?

\(\beta_0\) is absorbed into the individual fixed effects \(\alpha_i\) — every unit gets its own intercept
\(\beta_1\) disappears - time-invariant differences are absorbed by individual fixed effects
\(\beta_2\) disappears - time fixed effects capture common shocks over time

Why use fixed effects?

Controls for unobserved heterogeneity across individuals and over time
Focuses on within-individual variation to identify treatment effects
More robust to omitted variable bias from time-invariant confounders

Difference-in-Differences (DiD)

Two-way fixed effects

Estimation with fixed effects

What changes with fixed effects?

The DiD estimate remains similar, indicating robustness to fixed effects
Controls for unobserved individual heterogeneity and time effects
Time-invariant confounders are addressed — ability drops out automatically because it is constant per student and is fully absorbed by the student fixed effect
Parallel Trends
But what about standard errors?

Difference-in-Differences (DiD)

Two-way fixed effects (TWFE)

Under the Hood: What do fixed effects actually do?

Any time-invariant characteristic that cannot be observed in the data \(=\) fixed effect
Fixed effects only use within-unit variation over time — all cross-sectional differences between units are swept away. This is why time-invariant variables like ability and female drop out automatically: once you subtract each student’s own mean, a constant value cancels out completely
Use with caution! Do not fix effect away what you want to measure (e.g. do not use firm FE if your research question is about cross-sectional variation across firms)

Fixed effects solve endogeneity from time-invariant omitted variables. They do not solve endogeneity from time-varying confounders — if something unobserved changes over time and also affects both treatment and outcome, fixed effects won’t help. This is exactly what the parallel trends assumption guards against in DiD.

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

Why does this matter?

How to calculate economic magnitudes, e.g. \[\text{Economic Magnitude} = \beta \times \sigma_X\]
If \(\beta\) is identified using only within-group variation, then \(\sigma_X\) should reflect the variation that was actually used for identification — i.e. the within-group standard deviation, not the raw sample standard deviation that includes between-group variation that was purged by the FE
The Mismatch: Using raw variation as a counterfactual implies a change that the regression model itself never “saw” or used for identification

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

The Issue of Inflation and Extrapolation

Overstated Effects: The sample standard deviation is often two to three times larger than the within-group variation that actually identifies the coefficient.
Extrapolation Bias: Characterizing a “one-standard-deviation” increase based on raw data often represents an unrealistically large change that rarely occurs within a single firm or group.
Modeling Assumptions: Such large counterfactual changes rely heavily on linearity assumptions that may not be testable or defensible for changes that are significantly larger than common variation in the data.

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

Identifying the “Effective” Variation

Singleton & No-Variation Groups: FE can “mask” the effective sample. Observations in groups with only one entry (singletons) or zero variation in \(X\) do not contribute to the coefficient estimate
Shrunken Distributions: After residualizing variables to FE, the remaining variation is much more narrowly distributed
Distorted Unit of Change: Because FE regressions focus on predicted changes in \(Y\) in response to deviations from a group’s mean, the raw standard deviation is no longer a meaningful unit for characterizing the magnitude of the effect

Difference-in-Differences (DiD)

Economic magnitude and Fixed Effects

What to do about it?

Use Within-Group Scaling: Researchers should characterize economic magnitudes using thewithin-group standard deviation of the independent variable.
Report W-S Ratios: Supplement raw descriptive statistics with standard deviations of variables residualized to the FE to show how much variation was purged.
Validate Counterfactuals: Report the frequency of the proposed counterfactual move. If a “one-standard-deviation” move happens less than 10% of the time within a group, it may be a “extreme counterfactual” .
Within-\(R^2\): Supplement the total \(R^2\) with a “within-\(R^2\)” to show how much power comes from the variables of interest versus the FE themselves.

Difference-in-Differences (DiD)

Standard errors

Why do standard errors matter?

Standard errors quantify the uncertainty around our estimates
Incorrect standard errors can lead to misleading conclusions about statistical significance

What can go wrong? — Classic issues

Heteroskedasticity: If the variance of errors is not constant, standard errors may be biased (lecture 3)
- Use robust standard errors to account for heteroskedasticity
Autocorrelation: In DiD settings, errors may be correlated over time within groups, which can also lead to underestimated standard errors if not addressed
- Use Newey-West or block bootstrap methods to adjust for serial correlation (in DiD settings)

Difference-in-Differences (DiD)

Standard errors

What can go wrong? — Clustering

Clustering: If treatment is assigned at a group level (e.g., course level), outcomes within groups may be correlated, leading to underestimated standard errors if not accounted for (Moulton problem, see Moulton (1990))

When and how to adjust standard errors?

Cluster standard errors at the level of treatment assignment (e.g., course level) to account for within-group correlation
Use robust standard errors to account for heteroskedasticity
Be cautious with a small number of clusters (less than ~30), as standard errors may still be biased; consider using wild cluster bootstrap methods in such cases

Difference-in-Differences (DiD)

Standard errors

Example: DiLLMa

Treatment is assigned at the course level, but we have multiple students per course
Outcomes of students within the same course are likely correlated (e.g., due to shared instructor, materials, peer effects)
If we ignore this clustering and use standard errors that assume independence, we will underestimate the true variability of our estimates
This can lead to falsely concluding that the treatment effect is statistically significant when it may not be (Type I error)

Difference-in-Differences (DiD)

Standard errors

Example: DiLLMa with clustering

What changes with clustering?

Standard errors increase when clustering by course, reflecting the within-course correlation of outcomes
No change in point estimates, but the statistical significance of the treatment effect may change (e.g., from significant to non-significant)

Difference-in-Differences (DiD)

Standard errors

Key takeaways on standard errors in DiD settings

Always use robust standard errors (solves heteroskedasticity issues)
Rule of thumb I: Cluster at the level of the treatment/``shock’’, e.g. treatment is on state-level (new law) \(\Rightarrow\) cluster standard errors at state
Be careful: Less than \(\approx\) 30 clusters makes standard errors too small
Rule of thumb II: Check robustness of different cluster methods \(\Rightarrow\) report most conservative one

Difference-in-Differences (DiD)

Takeaways so far

DiD leverages both cross-sectional and time-series variation to identify causal effects (ATT)
The canonical DiD model can be extended with fixed effects to control for time-invariant heterogeneity
The parallel trends assumption is crucial for valid DiD inference
Proper standard error adjustments are essential for accurate inference in DiD settings
Is that all? Not quite…

Limitations

Limitations arise in rollout (staggered) designs, where treatment timing varies across groups; TWFE can perform poorly in such settings…

Difference-in-Differences (DiD)

Staggered adoption — The problem

The canonical DiD assumes a clean 2×2 world

	Before	After
Treated	✓	✓
Control	✓	✓

Staggered (rollout) DiD

When units receive treatment at different points in time, we call this a staggered adoption or rollout design. This is extremely common in finance and economics research.

Difference-in-Differences (DiD)

Staggered adoption — Why TWFE can go wrong

The standard Two-Way Fixed Effects (TWFE) estimator implicitly uses all available 2×2 comparisons, including some you probably don’t want:

Goodman-Bacon decomposition

The “forbidden comparison” problem

Early-treated vs. never-treated ✓ Clean comparison
Late-treated vs. never-treated ✓ Clean comparison
Early-treated vs. late-treated ⚠️ The already-treated group acts as a “control” — but they’ve already been treated!

Difference-in-Differences (DiD)

Staggered adoption — The negative weights problem info-pic

The negative weights problem

When treatment effects are heterogeneous, TWFE can assign negative weights to some group-time comparisons. This means TWFE can produce a negative estimate even when every single unit has a positive treatment effect. Goodman-Bacon (2021) decomposed TWFE into its component 2×2 DiDs and showed this explicitly.

Why does this happen?

Suppose early-treated firms experience a large initial treatment effect that fades over time.
TWFE will then use the later period, when the early-treated group’s effect has faded, as a “control” for the late-treated group.
TWFE interprets the fading as a negative effect — even though the treatment was always positive.

The problem is not staggered adoption per se — it is staggered adoption combined with treatment effect heterogeneity.

Difference-in-Differences (DiD)

Staggered adoption — Recent solutions (high-level)

New approaches share a common idea: only compare treated units to clean controls

Approach	Key idea	Reference
Callaway & Sant’Anna	Estimate separate ATTs for each (group, time) cell, then aggregate	Callaway and Sant’Anna (2021)
Sun & Abraham	Interaction-weighted estimator; heterogeneity-robust	Sun and Abraham (2021)
Borusyak, Jaravel & Spiess	Imputation-based; efficient	Borusyak, Jaravel, and Spiess (2024)
Stacked DiD	Manually construct clean 2×2 datasets and stack them	Cengiz et al. (2019) and Gormley and Matsa (2011)

Difference-in-Differences (DiD)

Staggered adoption — Practical guidance

Rule of thumb

If your setting is a clean 2×2 (one treatment date, one treated group): standard TWFE is fine
If your setting has staggered rollout: check for and report the Goodman-Bacon decomposition to understand how much your estimate is driven by “forbidden” comparisons
If heterogeneous effects are plausible (they usually are): use Callaway-Sant’Anna or Sun-Abraham as robustness

Difference-in-Differences (DiD)

Beyond your MSc replication

New approaches

How to learn about the new approaches?

See e.g. Asjad Naqvi’s Github repository that provides code and resources for implementing the new DiD methods or Youtube lectures by the original authors
Check the papers by Callaway and Sant’Anna (2021), Sun and Abraham (2021), Borusyak, Jaravel, and Spiess (2024), and Cengiz et al. (2019)
Go beyond the literature in this course, e.g. Chaisemartin and D’Haultfœuille (2023)

Why should you care about the new approaches?

A way to extend your MSc replication project is to apply some of the new DiD methods to your data and compare the results to the standard TWFE approach. This can provide insights into the robustness of your findings and demonstrate your ability to apply cutting-edge econometric techniques.

Thank You for Your Attention!

See You in the Next One!

References

Abouk, Rahi, and Scott Adams. 2013. “Texting Bans and Fatal Accidents on Roadways: Do They Work? Or Do Drivers Just React to Announcements of Bans?” American Economic Journal: Applied Economics 5 (2): 179–99.

Baker, Andrew C, David F Larcker, and Charles CY Wang. 2022. “How Much Should We Trust Staggered Difference-in-Differences Estimates?” Journal of Financial Economics 144 (2): 370–95.

Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2024. “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies 91 (6): 3253–85.

Breuer, Matthias, and ED DeHaan. 2024. “Using and Interpreting Fixed Effects Models.” Journal of Accounting Research 62 (4): 1183–1226.

Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.

Cameron, A Colin, and Douglas L Miller. 2015. “A Practitioner’s Guide to Cluster-Robust Inference.” Journal of Human Resources 50 (2): 317–72.

Cengiz, Doruk, Arindrajit Dube, Attila Lindner, and Ben Zipperer. 2019. “The Effect of Minimum Wages on Low-Wage Jobs.” The Quarterly Journal of Economics 134 (3): 1405–54.

Chaisemartin, Clément de, and Xavier D’Haultfœuille. 2023. “Credible Answers to Hard Questions: Differences-in-Differences for Natural Experiments.” Available at SSRN 4487202.

Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2024. “A Crash Course in Good and Bad Controls.” Sociological Methods & Research 53 (3): 1071–1104.

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.

Gormley, Todd A, and David A Matsa. 2011. “Growing Out of Trouble? Corporate Responses to Liability Risk.” The Review of Financial Studies 24 (8): 2781–821.

Huntington-Klein, Nick. 2022. The Effect: An Introduction to Research Design and Causality. 2nd ed. Chapman; Hall/CRC.

Liu, Zack, and Adam Winegar. 2025. “Economic Magnitudes Within Reason.” Journal of Corporate Finance 83: 102707. https://doi.org/10.1016/j.jcorpfin.2024.102707.

Moulton, Brent R. 1990. “An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units.” The Review of Economics and Statistics, 334–38.

Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99.

Verbeek, Marno. 2021. Panel Methods for Finance: A Guide to Panel Data Econometrics for Financial Applications. De Gruyter.

Appendix

Difference-in-Differences (DiD)

Parallel trends assumption

Unconditional means

Event study approach

The parallel trends assumption states that, in the absence of treatment, the average change in the outcome variable would have been the same for both the treatment and control groups. Back

Difference-in-Differences (DiD)

Bacon decomposition - Infographic

Back

Stata example - If time permits

Stata example

Texting bans and traffic accidents — Background

Cross-sectional variation

Temporal variation

\[Y_{i,t} = \gamma_{State} + \theta_{Month} + \beta_3Ban_{i} \times Post_{t} + \epsilon_{i,t}\]

Stata example

Texting bans and traffic accidents — Main results

Main results

Interpretation

3.7% reduction of accidents in states after passing the ban compared to states without a ban in place
Strong ban states: 8.1% reduction in accidents
Weak ban states: 7.5% increase in accidents