Bringing It All Together

Data Analytics for Finance

Caspar David Peter

Rotterdam School of Management, Accounting Department

Bringing It All Together

Before We Start…

Assignment Release and Submission Schedule

Assignment Release Deadline Graded
A1 11.02.2026 25.02.2026 \(\checkmark\)
A2 11.02.2026 25.02.2026 \(\checkmark\)
A3 18.02.2026 04.03.2026 \(\checkmark\)
A4 25.02.2026 11.03.2026
A5 04.03.2026 18.03.2026
A6 11.03.2026 25.03.2026

Assignment 6!

A6 is the replication exercise. You will replicate Huck (2024) using Dutch data. Everything we discuss today feeds directly into it. Deadline: March 25, 2026.

Overview

Today’s agenda

Part 1: Where We’ve Been

  • Recap of Lectures 1–5
  • The empirical researcher’s toolkit

Part 2: The Paper

  • Huck (2024) in RFS
  • Every section mapped to your toolkit

Part 3: Your Turn

  • Assignment 6: Dutch replication
  • Walkthrough

Part 4: Wrap-Up

  • Thesis connection
  • The role of AI in your workflow

Overview

Learning objectives

By the end of today, you will be able to

  • Recognize how the course’s methods connect to published research
  • Understand the research design and identification strategy of Huck (2024)
  • Interpret and critically evaluate replication results in a new setting
  • Write a discussion of findings, limitations, and extensions
  • Connect this exercise to your MSc thesis

Why this matters

You are about to do exactly what thesis Module 2 asks: read a published paper, replicate the methodology with new data, and evaluate whether findings generalize.

Part 1: Where We’ve Been

Where We’ve Been

The journey in one picture

Each lecture added a layer to your empirical toolkit. Today we see how they work together in a single published paper and in your own replication.

Lecture 1: From Messy Data to Research Insights

The foundation: You can’t analyze what you can’t structure

Key takeaways

  • Research starts with data wrangling, not regressions
  • Raw data is messy: missing values, inconsistent formats, multiple sources
  • The pipeline: import ⤏ clean ⤏ transform ⤏ merge ⤏ analyze
  • Tools: Stata, merge, reshape, collapse, gen/replace

From this course

  • Compustat Global via WRDS
  • The “tidy data” principle: one row per observation unit
  • Optional: SQL DB for efficient data management

Remember this?

In every assignment, you spent 60–70% of your time on data preparation. That’s research. Facts!

Lecture 2: Data Exploration & Visualization

Look before you run regressions

Key takeaways

  • Visualization is diagnosis, not decoration
  • Summary statistics reveal distributions, outliers, patterns
  • Histograms, scatter plots, time series — each serves a purpose
  • Always check your data before running regressions

Application

The principle

If your regression tells you something your visualization doesn’t support. Trust the visualization and debug the regression.

Lecture 3: Regressions & Identification

The sign-flip — OVB can reverse your conclusions

Key takeaways

  • Identification = isolating causal variation
  • Endogeneity: X correlated with the error term
    • Selection bias: who self-selects into treatment
    • Omitted variable bias: what you forgot to control for
  • Experiments as the gold standard
  • OLS: the workhorse, but only as good as your identification

The DiLLMa lesson

OVB doesn’t just make estimates imprecise, it can flip the sign!

Lecture 4: Panel Data Methods

Control for what you can’t measure

Key takeaways

  • Panel data: multiple units observed over time
  • Fixed effects absorb unobserved heterogeneity
    • Entity FE: time-invariant characteristics
    • Time FE: common shocks across units
  • Difference-in-Differences: treatment vs control, before vs after
  • Parallel trends assumption as the key identification requirement

From this course

  • Dieselgate as a natural experiment
  • Within-firm variation: same VW before vs after scandal
  • Staggered adoption and modern DiD advances (Sun & Abraham, Callaway & Sant’Anna)
  • Clustering standard errors by the right dimension

The intuition

If you can’t run an experiment and can’t observe all confounders, panel methods let you difference them out.

Lecture 4: Panel Data Methods

Control for what you can’t measure

Key takeaways

  • Panel data: multiple units observed over time
  • Fixed effects absorb unobserved heterogeneity
    • Entity FE: time-invariant characteristics
    • Time FE: common shocks across units
  • Difference-in-Differences: treatment vs control, before vs after
  • Parallel trends assumption as the key identification requirement

Key picture

The intuition

If you can’t run an experiment and can’t observe all confounders, panel methods let you difference them out.

Lecture 5: Event Studies & Fama–MacBeth

Specialized tools for finance

Event Studies

  • Short-window test of market reactions
  • Estimation window ⤏ expected return model ⤏ abnormal returns ⤏ CARs
  • The “counterfactual” is the factor model prediction
  • Same logic as DiD, but at daily frequency
  • Leads and lags reveal timing of information incorporation

Key picture

Lecture 5: Event Studies & Fama–MacBeth

Specialized tools for finance

Event Studies

  • Short-window test of market reactions
  • Estimation window ⤏ expected return model ⤏ abnormal returns ⤏ CARs
  • The “counterfactual” is the factor model prediction
  • Same logic as DiD, but at daily frequency
  • Leads and lags reveal timing of information incorporation

Fama–MacBeth

  • Two-step procedure for cross-sectional pricing tests
  • Step 1: Cross-sectional regression each period
  • Step 2: Time-series average of coefficients
  • Handles cross-sectional correlation by construction
  • The standard approach for testing factor premia

The Empirical Researcher’s Toolkit

Matching research questions to methods

Research question type Data structure Method Lecture
Does X predict Y? Cross-section OLS with controls L3
Does X cause Y? (observable confounders) Cross-section OLS + careful controls L3
Does X cause Y? (unobservable confounders) Panel Fixed effects, DiD L4
How did the market react to event E? Daily returns Event study (CARs) L5
Does characteristic Z earn a risk premium? Panel of returns Fama–MacBeth L5
Does this US result hold in the Netherlands? Same spec Replication L6

Key insight

The method follows from the question and the data structure and not the other way around.

The Empirical Researcher’s Toolkit

The complete methods toolkit

Lecture Method Identifies Data structure Key assumption
L3 OLS + controls Cross-sectional relationships Cross-section No omitted variables
L4 DiD / Panel FE Causal effect of treatment Panel Parallel trends
L5 Event study Market reaction to specific events Daily panel Market efficiency + correct factor model
L5 Fama–MacBeth Systematic return predictability Repeated cross-sections Cross-sections are independent draws

Key insight

The art of empirical finance is choosing the method whose assumptions are most credible for your specific research question and data.

The Empirical Researcher’s Toolkit

Standard errors: A unifying thread

The standard error story has been building across the course:

  • L3: Heteroskedasticity-robust SE → solves non-constant variance
  • L4: Cluster-robust SE → handles within-group correlation (Moulton problem)
  • L5: Fama–MacBeth SE → handles cross-sectional correlation of residuals

All three address the same fundamental issue: when residuals are not independent, naive standard errors are too small. The solution is always to match the inference method to the correlation structure of the data.

The common thread

The choice of standard error is not a technicality, it is a statement about what you believe is independent in your data. Getting it wrong can flip your conclusions from significant to insignificant (or vice versa).

Part 2: The Paper through the Lens of Your Toolkit

The Paper

Overview

Citation

Huck (2024) — “The Psychological Externalities of Investing: Evidence from Stock Returns and Crime”

The Review of Financial Studies, 37(4), 1172–1213.

The question

Do stock market returns affect the psychological well-being of both investors and noninvestors?

Why this paper?

  • Published in a top-5 finance journal
  • Uses methods you’ve learned in Lectures 1–5
  • Clean identification strategy
  • Surprising result: the stock market affects people who don’t even own stocks!

The Paper

The idea

Crime as a measure of psychological well-being

  • Traditional approach: surveys (subjective well-being questions)
  • Problem: surveys are low-frequency, susceptible to framing, and rarely measure wealth effects
  • Huck’s innovation: use daily crime data as a revealed response to psychological distress

Precedent

Card and Dahl (2011) use intimate partner violence as a proxy for emotional shocks from NFL game outcomes. Same logic: if people feel worse, some act out in ways that show up in the data.

The Paper

Contrasting predictions (Hypothesis)

The contrasting predictions are what make this interesting — it’s not just “does X cause Y” but “does X cause Y differently for subgroups?”

The Paper

Relative wealth (Mechanism)

Investors

  • Stock market rises ⤏ absolute wealth increases
  • Also better off relative to average person
  • Psychological well-being improves
  • Violent crime decreases

Noninvestors

  • Stock market rises ⤏ no change in own wealth
  • But now relatively worse off compared to investors
  • Psychological well-being declines
  • Violent crime increases

This is consistent with relative status models (Abel 1990; Gali 1994) and the “keeping up with the Joneses” literature.

The Paper

Data: The pipeline you already know

Three data sources

  1. Crime: FBI NIBRS: 55M+ incidents, 2,700 agencies, 1991–2015
  2. Income: ACS Census: median income by city/county (proxy for investor status)
  3. Returns: Kenneth French’s website

Cleaning decisions

  • Filter inconsistent reporters
  • Remove agencies with <50 report days,small/remote agencies (<2,000 pop)
  • Crime day starts at 6 AM (before market opens)

Variable construction

  • Crime rates: incidents per 100M population
  • Income terciles: high/medium/low as investor proxy
  • Standardized returns: \(r_t / \sigma_{t,252}\)
  • Violent crime: assaults + homicides only

Link to L1–L2

This is the data pipeline from Lectures 1–2. Three raw sources ⤏ clean ⤏ merge ⤏ construct variables. Huck devotes a substantial part of the paper to describing these choices. Remember when 60–70% of your assignment time was data prep?

The Paper

Data: Income as investor proxy

Why income terciles?

  • Malloy, Moskowitz, and Vissing-Jørgensen (2009) show probability of holding stocks increases significantly with income
  • Top tercile ≈ likely investors
  • Bottom tercile ≈ likely noninvestors
  • Available for all agencies

Why violent crime specifically?

  • Aggregate crime ⤏ measurement problem: theft in rich areas likely committed by low-income individuals
  • Violent crime (assaults, homicides): typically occurs at home, offender known to victim
  • 54% of assault victims in middle/high-income households
  • 44% assaulted at or near their home
  • Minimizes the misattribution problem between offender income and location income

The Paper

Descriptive evidence

Huck (2024), Table 2 — Summary statistics (crime rates per 100M, daily 1991–2015)
Average Median SD
Crime rate (all) 22,659 17,675 18,565
   High-income 18,347 14,547 14,577
   Medium-income 24,215 19,618 18,894
   Low-income 27,043 21,527 21,784
Violent crime rate 4,676 2,437 7,140
   High-income 3,370 1,524 5,455
   Medium-income 5,077 3,021 7,199
   Low-income 6,093 3,637 8,748
Market return (std.) 0.038 0.082 1.044

The Paper

Panel FE regression

The model

\[y_{i,t} = \beta \cdot r_t + \gamma \cdot X_{i,t} + \theta_{i,a(t)} + \mu_{i,m(t)} + \omega_{i,w(t)} + \delta_{i,d(t)} + T_t + \varepsilon_{i,t}\]

What it is What it absorbs
\(y_{i,t}\) Crime rate for agency \(i\) on day \(t\) Outcome variable
\(r_t\) Standardized daily market return Coefficient of interest
\(X_{i,t}\) Weather, pollution, sports outcomes, celestial Daily confounders
\(\theta_{i,a(t)}\) Agency × year FE Local business cycles, demographics
\(\mu_{i,m(t)}\) Agency × month-of-year FE Local seasonal crime patterns
\(\omega_{i,w(t)}\) Agency × week-of-month FE Payday effects
\(\delta_{i,d(t)}\) Agency × day-of-week FE Weekend effects
\(T_t\) Turn-of-month + holiday FE Calendar anomalies

Standard errors clustered by time and agency.

The Paper

Why so many fixed effects?

The identification challenge

The concern: maybe something else that happens on the same day drives both stock returns and crime (e.g., weather, sports, holidays).

Huck’s defense

  • Agency × year: absorbs local economic conditions that change annually
  • Agency × month: absorbs seasonal crime patterns (summer peak, winter trough)
  • Agency × day-of-week: absorbs weekend/weekday differences
  • Explicit controls for weather, NFL/MLB, moon phases, daylight savings

The key insight

Identification occurs within each agency-year. After removing all these patterns, the remaining variation in crime is compared with daily market return fluctuations.

Link to L4

This is the fixed effects logic taken seriously. Each layer of FE removes another source of confounding. The residual variation is what identifies the effect.

The Paper

Differential effects by income (Core test)

The model is estimated separately for high-income (investor) and low-income (noninvestor) locations:

Results (violent crime)

Adapted from Huck (2024), Table 3
High-income Low-income
Market return −12.47*** +15.22**
(−3.291) (2.043)
% of avg −0.37% +0.25%
t(HIGH=LOW) 3.313

Interpretation

  • 1 SD market return ↑ ⤏
    • 37 bps decrease in violent crime for investors
    • 25 bps increase in violent crime for noninvestors
  • Effects are statistically different from each other (t = 3.31)
  • Signs are opposite — consistent with relative wealth theory

The Paper

Economic magnitude

How big is this?

  • A 1% increase in the stock market (≈ 1 SD) corresponds to:
    • An additional 76–179 crimes per day across the US
    • $7.1–16.8 million in daily costs to society
    • $1.8–4.2 billion annualized
  • For investors, the 37 bps decrease in violent crime is comparable to Engelberg and Parsons (2016) finding of a 21 bps decrease in mental health admissions in California after market increases

Key takeaway

The stock market has real externalities beyond the portfolios it directly affects. This is what makes it a finance paper with social implications.

The Paper

Is this really same-day? (Timing and identification)

Leads and lags (Table 4)

t−3 t−2 t−1 t t+1 t+2 t+3
High 1.2 6.1 −6.4 −12.5*** −3.0 −0.1 3.8
Low 6.4 −9.1 −3.8 15.2** 5.0 4.7 2.1

Only the contemporaneous (day \(t\)) coefficients are significant!

What this tells us

  • No pre-trends: market returns on t−1, t−2, t−3 don’t predict crime
  • No delayed effects: market returns don’t affect crime in coming days
  • Within the day: effect is strongest after markets close
  • This is consistent with: see news ⤏ emotional response ⤏ same-day behavior

The Paper

Defending the identification (Robustness)

Good empirical work doesn’t just show one result, it shows the result survives scrutiny.

What Huck tests

  • Alternative investor proxies: IRS dividend data, SCF probit model — same results
  • Alcohol channel: ruled out (crime-return relationship not driven by drinking)
  • At-home violence: strongest for male-on-female domestic violence — location matches offender
  • Earnings surprises: local firm announcements with daily FE — absorbs all common daily shocks

Why this matters

Each robustness check addresses a specific threat to identification:

  • Alternative proxies ⤏ measurement concern
  • Alcohol ⤏ omitted variable / alternative mechanism
  • At-home violence ⤏ misattribution concern
  • Earnings surprises ⤏ common shock concern

The Paper

Earnings surprises

Why this test is particularly convincing

  • Uses local firm earnings surprises (not market-wide returns)
  • Includes daily fixed effects that absorb any common daily shock
  • Earnings announcements are salient reports on past performance ⤏ should be unrelated to other daily news
  • Exploits the local bias of investors: people pay more attention to firms headquartered nearby

Result

Positive local earnings surprises are associated with:

  • Decreased violent crime in high-income (investor) areas
  • Increased violent crime in low-income (noninvestor) areas

Same contrasting pattern — even after removing all common daily variation.

The Paper

What makes this paper work?

Paper element Skill/Tool You learned this in
Three data sources ⤏ merged panel Data wrangling, merge, reshape L1
Summary statistics, time series plots EDA, distributional analysis L2
OLS regression, interpreting coefficients OLS, hypothesis testing L3
Agency × year FE, agency × seasonal FE Panel fixed effects, TWFE L4
Leads/lags analysis (event window logic) Event study timing L5
Interaction terms (return × income group) Differential effects, DiD intuition L4
Clustered standard errors Correct inference L3–L5
Robustness to alternative specifications Defending identification L3–L5

The point

You have the entire toolkit to read, understand, and replicate this paper. That’s what you’ll do in Assignment 6.

Part 3: Your Turn — The Dutch Replication

Assignment 6

What stays, what changes?

What stays the same

  • The research question and hypothesis
  • The empirical specification structure
  • Income terciles as investor proxy
  • Interaction terms for differential effects
  • Fixed effects and robustness checks

What changes

  • Country: Netherlands instead of US
  • Market: AEX index instead of S&P 500
  • Crime data: CBS municipality data instead of FBI NIBRS
  • Frequency: Monthly instead of daily
  • Geography: Municipalities instead of cities
  • Income: CBS municipality median income instead of ACS

Important difference

Monthly frequency is the biggest limitation. We cannot test Huck’s timing analysis (Table 4–5) or identify same-day effects. This is a key limitation you’ll discuss in Section 14.

Assignment 6

Part A: The data pipeline

Three raw datasets

  1. crime_data.dta: Municipality × month panel of crime incidents by type
  2. income_data.dta: Cross-sectional average median income per municipality (2011–2018)
  3. aex_daily.dta: Daily AEX index closing prices

Your pipeline

  • Section 2: Load and examine
  • Section 3: Calculate standardized returns (trailing 252-day SD, then aggregate to monthly)
  • Section 4: Reshape crime data (long ⤏ wide)
  • Section 5: Merge all three datasets
  • Section 6: Create analysis variables (terciles, interactions)

Remember: Part A is 60–70% of the work. That’s normal.

Assignment 6

Part B: The analysis

Section 7–8: Descriptive statistics & visualization

Before regressions, you create:

  • Summary statistics by income tercile
  • Time series of crime rates (trending down? seasonal patterns?)
  • Crime rates overlaid with AEX returns
  • Crime trends split by income group

Link to L2

This is the “look before you regress” principle. Your descriptive statistics should reveal patterns that your regressions will formalize.

Assignment 6

Baseline panel regression (Section 9)

The baseline model estimates the average effect of AEX returns on crime across all municipalities:

\[\text{Crime}_{it} = \beta \cdot r_t^{AEX} + \alpha_i + \varepsilon_{i,t}\]

  • Municipality fixed effects (\(\alpha_i\)) absorb time-invariant differences
  • This tests: “Do stock returns predict crime on average?”
  • But it does not test Huck’s key hypothesis about differential effects

Assignment 6

The core test (Section 10)

Our specification (with interactions)

\[\text{Crime}_{it} = \beta_0 + \beta_1 \text{Returns}_t + \beta_2 (\text{Returns}_t \times \text{High}_{i}) + \beta_3 (\text{Returns}_t \times \text{Low}_{i}) + \alpha_i + \varepsilon_{it}\]

Where:

  • \(\beta_1\) = Effect in medium-income areas (reference group)
  • \(\beta_1 + \beta_2\) = Effect in high-income areas
  • \(\beta_1 + \beta_3\) = Effect in low-income areas

Key test: Is \(\beta_2 < 0\) (high-income negative) and \(\beta_3 > 0\) (low-income positive)?

Assignment 6

Two-way fixed effects & robustness (Section 11 & 12)

Two-way fixed effects (TWFE)

\[\text{Crime}_{it} = \beta_2 (\text{Returns}_t \times \text{High}_{i}) + \beta_3 (\text{Returns}_t \times \text{Low}_{i}) + \alpha_i + \gamma_t + \varepsilon_{it}\]

Where:

  • \(\alpha_i\) = Municipality fixed effects
  • \(\gamma_t\) = Time (year-month) fixed effects
  • Note: The main return effect \(\beta_1 \text{Returns}_t\) is absorbed by time FE because all municipalities have the same return in month \(t\)

Assignment 6

Two-way fixed effects & robustness (Section 11 & 12)

Section 12: Robustness checks

  • Log crime: addresses skewed distribution
  • Assault only: closer to Huck’s violent crime measure
  • High-volatility months: does the effect strengthen when market moves are larger?

Assignment 6

The publication-quality table (Section 13)

Section 13 asks you to create a single table combining all specifications:

Part 3B: Interpretation, Limitations & Thesis

Section 14: Interpreting Your Findings

What a good summary looks like

Structure to follow

  1. Main result: Do you find differential effects?
  2. Signs: Consistent with Huck’s theory?
  3. Statistical significance: Are key coefficients significant?
  4. Economic magnitude: How large are effects?
  5. Robustness: Do results hold across specifications?
  6. Comparison to Huck: Same or different from US results?

Common mistakes to avoid

🤦‍♀️ Just listing coefficients without interpretation

🤦‍♀️ Over-claiming causality (“AEX returns cause crime”)

🤦‍♀️ Ignoring null results (“we didn’t find it, so we won’t discuss it”)

💰 Null results are informative! If you don’t find the same pattern, that’s a finding worth discussing.

Section 14: Interpreting Your Findings

What if you don’t replicate Huck’s results?

Think about why:

  • Monthly ≠ daily: effects may wash out at lower frequency
  • AEX is a smaller market with fewer household investors
  • CBS crime data has different reporting standards than FBI NIBRS
  • Dutch stock participation rate may differ from US
  • Welfare state ⤏ less relative deprivation?
  • Cultural differences in violence as stress response
  • Netherlands is geographically smaller ⤏ less local variation

For your thesis

A well-explained null finding shows deeper understanding than a weak positive with no discussion.

Section 14: Limitations

How to write honest limitations

The formula: What ⤏ Why it matters ⤏ How it differs from Huck

Good limitations

“Monthly data cannot test within-day timing effects (Huck’s Table 4–5), limiting our ability to distinguish same-day from lagged effects.”

“Municipality-level income is an imperfect proxy for investor participation; Huck uses census micro-data allowing for more precise classification.”

“A single market return means no cross-sectional variation in ‘treatment’ within a given month — all municipalities face the same AEX return.”

Bad limitations

“We don’t have enough data” ⤏ Too vague. What data? Why does it matter?

“Results might be wrong” ⤏ Undermines everything without explaining what could be wrong.

“The Netherlands is different from the US” ⤏ How is it different, and why does that affect your specific test?

Section 14: Limitations

The key limitations for Assignment 6

Limitation Consequence vs. Huck
Monthly frequency Cannot identify contemporaneous effects; timing tests impossible Huck uses daily + hourly data
Single market return No cross-sectional treatment variation within months Huck also has local earnings surprises
Ecological fallacy Municipality income ≠ individual investor status Same issue, but Huck has finer geography
Standard errors Few time periods (~96 months), spatial correlation Huck has ~6,000 trading days
Crime reporting Measurement varies by municipality and crime type NIBRS is standardized across agencies
Selection/attrition Municipality mergers change boundaries over time Huck filters for consistent reporters

Section 14: What Would Strengthen the Analysis?

Think creatively

Better data

  • Daily crime data (if available from CBS)
  • Individual brokerage account data by municipality (AFM, DEGIRO, or similar)
  • Survey data on stock market awareness by region

Natural experiments

  • Neo-broker entry: Trade Republic, BUX, DEGIRO expansion ⤏ DiD design
  • Tax policy changes affecting participation (e.g., box 3 reforms)
  • COVID market crash (extreme, salient event)

Alternative outcomes

  • Health data (hospital admissions, mental health)
  • Google Trends for financial distress keywords
  • Consumer confidence by region

Extensions

  • Cross-country comparison (UK has better crime data)
  • Crypto markets (more volatile, younger investors, higher awareness)
  • Social media sentiment as mediator

Sections 15 to 16: The Thesis Connection

You just completed a mini-thesis

Your MSc thesis modules

  1. Module 1: Literature review & research question
  2. Module 2: Replication of published results with new/similar data
  3. Module 3: Extension or new analysis building on Module 2

What you did in A6

  • Read a published paper ⤏ Module 1
  • Replicated the methodology with Dutch data ⤏ Module 2
  • Discussed extensions and improvements ⤏ Module 3 ideas

The direct connection

Assignment 6 is Module 2. Your improvement suggestions in Section 14.3 are Module 3 ideas. The skills you practiced — data wrangling, panel regression, robustness, interpretation — are exactly what your thesis demands.

Part 4: Wrap-Up

The Role of AI in Your Thesis Workflow

You need to understand the method to guide the tool

What AI is good at

  • Writing Stata/R/Python code from clear instructions
  • Debugging syntax errors
  • Formatting tables and figures
  • Summarizing literature

What AI cannot do for you

  • Choose your identification strategy
  • Decide which fixed effects to include and why
  • Interpret a null result
  • Write an honest limitations discussion
  • Defend your choices in a thesis defense

“If you can’t explain why you’re including municipality fixed effects, the LLM can’t save you in your thesis defense.”

The value chain

Understanding ⤏ Design ⤏ Implementation. AI helps most with the last step, least with the first. This course focused on the first two — because those are what make you a researcher, not a code typist.

What You Can Now Do

The skills you built

Technical skills

  • Build a data pipeline from raw sources to analysis-ready panels
  • Run and interpret OLS, panel FE, DiD, event studies, and Fama–MacBeth regressions
  • Create publication-quality tables and figures
  • Conduct and explain robustness checks

Research skills

  • Read and critically evaluate an empirical finance paper
  • Choose methods appropriate to your research question and data
  • Identify threats to identification and address them
  • Write honest interpretation and limitation discussions
  • Replicate published work and evaluate generalizability

Practical Thesis Advice

Three things to remember

1. Start with the data, not the regression

If you can’t describe your data structure, you’re not ready to estimate anything. Invest time in Part A.

2. Replicate before you extend

Get the baseline working before you try new specifications. Confirm that your data pipeline produces sensible numbers.

3. Limitations aren’t weakness — they’re depth

Every published paper has limitations. Acknowledging them shows you understand the research design. Not acknowledging them suggests you don’t.

Final Assignment Reminders

A6 practical details

  • Deadline: March 25, 2026
  • Structure: Part A (data prep) ⤏ Part B (analysis) ⤏ Part C (interpretation)
  • Grading: (Auto-)graded through Section 13; Sections 14–16 are ungraded - “Food for thought”

Thank You!

It’s been a pleasure. Good luck with Assignment 6 and your thesis!

References

Abel, Andrew B. 1990. “Asset Prices Under Habit Formation and Catching up with the Joneses.” The American Economic Review 80 (2): 38–42. http://www.jstor.org/stable/2006539.
Card, David, and Gordon B Dahl. 2011. “Family Violence and Football: The Effect of Unexpected Emotional Cues on Violent Behavior.” The Quarterly Journal of Economics 126 (1): 103–43.
Engelberg, Joseph, and Christopher A Parsons. 2016. “Worrying about the Stock Market: Evidence from Hospital Admissions.” The Journal of Finance 71 (3): 1227–50.
Gali, Jordi. 1994. “Keeping up with the Joneses: Consumption Externalities, Portfolio Choice, and Asset Prices.” Journal of Money, Credit and Banking 26 (1): 1–8.
Huck, John R. 2024. “The Psychological Externalities of Investing: Evidence from Stock Returns and Crime.” The Review of Financial Studies 37 (7): 2273–2314.
Malloy, Christopher J, Tobias J Moskowitz, and Annette Vissing-Jørgensen. 2009. “Long-Run Stockholder Consumption Risk and Asset Returns.” The Journal of Finance 64 (6): 2427–79.