Website Data, Replication, and Research Design
Takeaway #1
Website data offers unprecedented longitudinal access, BUT requires careful attention to coverage gaps and data quality
Takeaway #2
Website-based measures complement traditional disclosure metrics BUT validity questions remain - interpret with caution
Do firms that would benefit from import relief attempt to decrease earnings through earnings management during ITC investigations?
Managers of domestic producers significantly decrease reported earnings during import relief investigations
Critical Feature
The ITC does not verify any information in the audited financial statements or 10-Ks, nor do they make any adjustments to these data… The ITC does not attempt to adjust the financial data for accounting procedures used or for accrual decisions made by the firms’ managers
Managers of domestic producers make accounting choices that reduce reported earnings during ITC investigation periods compared to non-investigation periods
\[ \frac{\text{Total accruals}_{it}}{AT_{i,t-1}} = \frac{1}{AT_{i,t-1}} + \beta_1 \frac{\Delta REV_{it}}{AT_{i,t-1}} + \beta_2 \frac{PPE_{it}}{AT_{i,t-1}} + \epsilon_{it}\]
Key Implication
Managers systematically manipulate accruals downward specifically during investigation periods, providing strong support for the earnings management hypothesis
Regulatory rule explicitly uses accounting profitability
All contracting parties benefit from appearing injured
Regulator doesn’t adjust for earnings management
Time-series control for firm-specific “normal” accruals
Clear prediction: decrease in investigation year only
Multiple industries, multiple years
Addresses free-rider problem with petitioner subsample
Key Implication
This setting provides a specific motive for earnings management that is not typically present in other contracting situations where all parties have incentives to monitor and adjust for manipulation
Disclaimer
This is a re-interpretation of Jones (1991), not a strict replication. Analyses are preliminary and unreviewed; treat all findings as exploratory. Results may change as data and methods are refined.
Here is a comparison of key dimensions between the original Jones (1991) study and my replication approach:
| Dimension | Jones Original | My Replication |
|---|---|---|
| Estimation approach | Time-series (firm-specific) | Cross-sectional (industry) |
| Data requirements | Long firm history | Industry observations |
| What it controls for | Firm-specific effects | Industry dynamics |
| Statistical power | Lower (fewer obs/firm) | Higher (pooled data) |
| Parameter interpretation | Firm-specific “normal” | Industry-specific “normal” |
Investigation announced: June 1985 (calendar)
Firm fiscal year ends: Dec 1984
Firm fiscal year ends: Dec 1985
Which "year" is Year 0?
# See phd-lecture-analyses.R script ...
time = case_when(
# Fiscal year ends before investigation decision - no adjustment needed
fyear == event_year & month(datadate) < year_zero_month ~ time,
# Fiscal year ends after investigation decision - shift time by +1
TRUE ~ time + 1
)
Key Insight
The timing “discrepancy” isn’t a bug—it’s a feature revealing that firms may manage earnings in anticipation of investigations
Key Insight
Institutional knowledge is key here - not yet fully utilized! Knowledge of investigation types is crucial for sample selection and interpretation of results. There is very much room for improvement here.
What we’ll look at:
⚠️ Would time-series give identical results?
⚠️ The Earnings Management Literature has advanced
🤔 Why do effects appear in Year -1 to Year 0?
The Bigger Question
Replication is validating, but not publishable. What NEW insight can we add 35 years later?
Research Question
Do firms strategically manage website disclosures during import relief investigations?
Data source: Cleaned website text from Haans and Mertens (2024)
Method: Keyword frequency analysis
Analysis: Event study around investigation dates
Spoiler: There’s sometimes something there, but it’s messy1
Wayback coverage varies wildly
Keywords might be too crude/noisy
Don’t know who controls website content
Website timing issues remain
Important Context
This is NOT a finished analysis. Most likely it will change until the lecture takes place! This is a “should I keep going?” check.
What we’ll look at:
📊 There IS variation in website language around investigations
📈 “Injury” keywords increase in Years -1 to 0
☣️ The signal is weak and noisy: Very sensitive to sample restrictions and keyword choices
🤷 Not clear if it’s strategic or mechanical
🎯 Attribution problem
⏰ Timing ambiguity
Critical Assessment
There’s a signal. Maybe. But I have five serious concerns that could kill this project.
Are keywords capturing “strategic injury narrative” or just “we’re in an investigation”?
Only later firm-years & events have snapshots available & when do websites get updated? Quarterly? Ad hoc?
Websites controlled by whom? CFO? Marketing? Compliance?
Is any website-(accruals-)investigation correlation real or just noise?
Which other EM measures exist and fit the purpose?
Reality Check
Any ONE of these could sink the project. Are they fixable, or fatal?
Break into groups of 2-3. Choose ONE prompt (10 minutes discussion):
Pick two of my five concerns. For each:
You have 3 more months and access to better NLP tools.
What ONE thing would you change/add to make this stronger?
Consider:
Be specific: Not “better analysis” but “sentiment analysis using FinBERT on earnings call transcripts to validate website tone”
Instead of keyword frequency, what ELSE could you measure on websites?
Possibilities:
Pick one and explain:
Assume one of the concerns kills the website idea.
What OTHER data source could you use to test “multi-channel disclosure strategy during investigations”?
Consider:
Evaluate: Would your alternative be better or worse than websites? Why?
For Your Own Research
Think of this repo structure as a template. Clear organization, documented decisions, reproducible code—these habits will save you (and your coauthors) countless hours.
✅ Low correlations = websites capture different information
✅ Multi-stakeholder (not just investors)
✅ Real-time vs. quarterly
✅ Voluntary vs. mandated
😕 Low correlations = measurement noise or invalidity
😕 Size ≠ quality (could be marketing fluff)
😕 Unaudited, unstandardized
😕 Designer decisions, not CFO strategy
Which is it? Back
Earnings Management Then and Now: Revisiting Jones (1991)