Data Analytics for Finance
BM17FI · Rotterdam School of Management
Data Visualization
Learning Objectives¶
In this assignment, you will learn to:
- Create time series plots with
twoway line - Customize graph appearance (colors, labels, legends, titles)
- Add reference lines and shaded regions for events
- Create multi-panel graphs comparing firms
- Examine distributions with histograms
- Export publication-quality figures as PNG files
Context: Visualizing the Dieselgate Impact¶
On September 18, 2015, the U.S. Environmental Protection Agency issued a Notice of Violation to Volkswagen for emissions cheating. In this assignment, you'll create visualizations to explore how this event affected stock prices of German automakers.
Effective data visualization is crucial in finance for:
- Communication: Presenting findings to stakeholders
- Pattern recognition: Identifying trends and anomalies
- Hypothesis generation: Visualizations often reveal unexpected relationships
Exercises¶
- Load cleaned data and verify panel structure
- Create a time series plot of VW stock prices (2013-2017)
- Plot all German automakers on the same graph for comparison
- Create an event window visualization with reference lines
- Examine the distribution of daily returns with a histogram
- Compare return distributions across firms
- Calculate and plot cumulative returns over time
- Export all graphs as publication-quality PNG files
Setup¶
Clear Environment¶
We start by clearing Stata's memory and disabling pagination.
✅ The environment is cleared and ready.
Set Graph Scheme¶
We'll use the stcolor scheme for clean, publication-ready graphs.
stcolor— Clean, minimalist (recommended for this assignment)s2color— Stata's default color schemeeconomist— Mimics The Economist magazine styles1rcolor— White background with colorplottig— Tight layout with gridlean2— Minimalist black and white
✅ Graph scheme set to stcolor.
Set File Paths¶
We define global macros for all data and output directories.
/Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assignment > s/02-assignment 📁 Base directory: /Users/casparm4/Github/rsm-data-analytics-in-finance-private > /private/assignments/02-assignment 📁 Raw data folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-privat > e/private/assignments/02-assignment/data/raw 📁 Processed data folder: /Users/casparm4/Github/rsm-data-analytics-in-finance- > private/private/assignments/02-assignment/data/processed 📁 Output directory: /Users/casparm4/Github/rsm-data-analytics-in-finance-priva > te/private/assignments/02-assignment/output 📁 Figures folder: /Users/casparm4/Github/rsm-data-analytics-in-finance-private > /private/assignments/02-assignment/output/figures
Section 1: Load Data and Setup¶
Task 1.1: Load Cleaned Dataset¶
What you'll do: Load the cleaned dataset from Assignment 1 (auto_firms_g_daily_clean.dta located in the "raw" folder) containing daily stock prices for 4 German automakers.
Why this matters: This dataset is the foundation for all our visualizations. It contains the panel data structure (firms × time) that we'll explore graphically.
What to expect: The dataset contains ~5,200 observations across 4 firms over the 2013-2017 period.
use "$raw/filename.dta", clear to load Stata datasets. The clear option removes any data currently in memory.
---- CHECKPOINT: data loaded ---- Number of observations: 5216 Number of variables: 13
Task 1.2: Examine Data Structure¶
What you'll do: Use describe and summarize to understand the data structure and verify key variables.
Why this matters: Before creating visualizations, you should always verify that the data matches your expectations.
Key variables to check:
gvkey: Firm identifierconm: Company namedate: Trading date (Stata date format)prccd: Closing price in EURret: Daily log return
What to expect: 4 firms, dates formatted as "01jan2013" style, prices in EUR.
Contains data from /Users/casparm4/Github/rsm-data-analytics-in-finance-private
> /private/assignments/02-assignment/data/raw/auto_firms_g_daily_clean.dta
Observations: 5,216
Variables: 13 6 Jan 2026 15:28
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
gvkey long %12.0g Compustat company identifier
conm str28 %28s Company name
date float %td Trading date
year float %9.0g Year
month float %9.0g Month
prccd float %9.0g Closing price (local currency)
ret float %9.0g Daily log return
ajexdi float %9.0g
sic int %8.0g
naics long %12.0g
fic str3 %9s
isin str12 %12s
sedol str7 %9s
-------------------------------------------------------------------------------
Sorted by: gvkey date
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
prccd | 5,216 100.8507 39.92719 38.645 247.55
ret | 4,172 .0003003 .0144726 -.1842681 .0801751
date | 5,216 20270.7 527.0549 19359 21182
+--------------------------------+
| gvkey conm |
|--------------------------------|
1. | 17828 MERCEDES BENZ GROUP AG |
2. | 17828 MERCEDES BENZ GROUP AG |
3. | 17828 MERCEDES BENZ GROUP AG |
4. | 17828 MERCEDES BENZ GROUP AG |
5. | 17828 MERCEDES BENZ GROUP AG |
|--------------------------------|
6. | 17828 MERCEDES BENZ GROUP AG |
7. | 17828 MERCEDES BENZ GROUP AG |
8. | 17828 MERCEDES BENZ GROUP AG |
9. | 17828 MERCEDES BENZ GROUP AG |
10. | 17828 MERCEDES BENZ GROUP AG |
|--------------------------------|
11. | 17828 MERCEDES BENZ GROUP AG |
12. | 17828 MERCEDES BENZ GROUP AG |
13. | 17828 MERCEDES BENZ GROUP AG |
14. | 17828 MERCEDES BENZ GROUP AG |
15. | 17828 MERCEDES BENZ GROUP AG |
|--------------------------------|
16. | 17828 MERCEDES BENZ GROUP AG |
17. | 17828 MERCEDES BENZ GROUP AG |
18. | 17828 MERCEDES BENZ GROUP AG |
19. | 17828 MERCEDES BENZ GROUP AG |
20. | 17828 MERCEDES BENZ GROUP AG |
+--------------------------------+
---- CHECKPOINT: data structure examined ----
Variables confirmed
✅ Test passed: Data loaded with expected dimensions.
Task 1.3: Declare Panel Structure¶
What you'll do: Set the panel structure using xtset to identify firms and time periods.
Why this matters: Even though we're not using time-series operators in this assignment, declaring the panel structure:
- Verifies the data is properly sorted
- Allows Stata's panel-aware commands to work
- Makes the data structure explicit
Panel structure:
- Panel variable:
gvkey(identifies which firm) - Time variable:
date(identifies when)
What to expect: Stata will report the number of panels (firms) and time periods per panel.
xtset panelvar timevar
panelvar: Variable identifying panel units (firms) →gvkeytimevar: Variable identifying time periods →date
xtdescribe to examine panel characteristics.
Panel variable: gvkey (strongly balanced)
Time variable: date, 01jan2013 to 29dec2017, but with gaps
Delta: 1 day
gvkey: 17828, 100022, ..., 100737 n = 4
date: 01jan2013, 02jan2013, ..., 29dec2017 T = 1304
Delta(date) = 1 day
Span(date) = 1824 periods
(gvkey*date uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1304 1304 1304 1304 1304 1304 1304
Freq. Percent Cum. | Pattern*
---------------------------+--------------------------------------------------
> ------------------------------------------------------
4 100.00 100.00 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX5
---------------------------+--------------------------------------------------
> ------------------------------------------------------
4 100.00 | XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
> XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
------------------------------------------------------------------------------
> ------------------------------------------------------
*Each column represents 18 periods.
---- CHECKPOINT: panel structure declared ----
Panel data structure set successfully
✅ Test passed: Panel structure correctly declared.
Section 2: Basic Time Series Plot¶
Task 2.1: Create VW Price Time Series¶
What you'll do: Create a line plot showing Volkswagen's stock price over the entire 2013-2017 period.
Why this matters: Time series plots are the foundation of financial data visualization. They reveal:
- Trends: Long-term price movements
- Volatility: Periods of stability vs. turbulence
- Events: Sharp changes that warrant investigation
What to expect: You'll see VW's price trajectory from 2013-2017, with a dramatic drop in September 2015.
twoway line command creates line plots:
twoway line yvar xvar if condition, ///
title("Graph title") ///
subtitle("Graph subtitle") ///
xtitle("X-axis label") ytitle("Y-axis label")
Key options:
if— Filter observations (e.g.,if gvkey == 100737)title()— Main graph titlesubtitle()— Subtitlextitle()— X-axis labelytitle()— Y-axis labelnote()— E.g. data source(s)///— Line continuation for readability
---- CHECKPOINT: VW price plot created ---- Graph displayed
✅ Test passed: Time series plot created successfully.
Task 2.2: Improve Graph Appearance¶
What you'll do: Enhance the graph with better formatting and save it to a file.
Why this matters: Small improvements in graph formatting dramatically increase readability:
- Descriptive titles help readers understand context
- Proper axis labels include units
- Saving graphs allows you to use them in reports
What to add...:
- Title: "Volkswagen Stock Price (2013-2017)"
- Subtitle: "Daily Closing Prices (2013-2017)"
- X-axis label: "Date"
- Y-axis label: "Price (EUR)"
What to expect: A polished graph saved as vw_price_timeseries.png in your figures folder.
graph export to save graphs:
graph export "$figures/filename.png", replace width(1200)Options:
replace— Overwrite existing filewidth()— Image width in pixels (1200-1600 recommended)- Supported formats: .png, .pdf, .eps, .svg
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig > nments/02-assignment/output/figures/vw_price_timeseries.png written in PNG fo > rmat ---- CHECKPOINT: graph exported ---- Saved to: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/ > assignments/02-assignment/output/figures/vw_price_timeseries.png
✅ Test passed: Graph exported successfully.
Section 3: Multi-Firm Comparison Plot¶
Task 3.1: Plot All Firms on Same Graph¶
What you'll do: Create a single graph showing stock prices for all 4 German automakers simultaneously.
Why this matters: Comparing multiple series on one graph reveals:
- Relative performance: Which firms outperformed or underperformed
- Correlation: Do prices move together or independently?
- Peer effects: Did the Dieselgate scandal affect other German automakers?
What to expect: Four colored lines on one graph, one for each firm (VW, BMW, Mercedes-Benz, MAN).
twoway ///
(line prccd date if gvkey == 100737, lcolor(navy)) ///
(line prccd date if gvkey == 100022, lcolor(maroon)) ///
(line prccd date if gvkey == 017828, lcolor(forest_green)) ///
(line prccd date if gvkey == 100042, lcolor(orange)), ///
legend(label(1 "Volkswagen AG") label(2 "BMW AG") ///
label(3 "Mercedes-Benz Group") label(4 "MAN SE"))
Each set of parentheses creates one line. Use lcolor() to assign distinct colors, and legend(label(...)) to label each line with the full company name.
Firm identifiers:
100737— Volkswagen AG →navy100022— BMW AG →maroon017828— Mercedes-Benz Group →forest_green100042— MAN SE →orange
xtitle("Date") and ytitle("Price (EUR)").
---- CHECKPOINT: multi-firm plot created ---- Graph displayed with 4 firms
Task 3.2: Refine Legend and Save¶
What you'll do: Improve the legend placement and save the comparison graph.
Why this matters: A well-placed legend enhances readability without obscuring data. Legend options include:
- Position (inside or outside the plot area)
- Number of columns
- Title
What to expect: A polished multi-firm comparison saved as german_autos_comparison.png. Keep the same colors, axis labels, and legend labels (full company names) from Task 3.1.
position(6)— Position: 6=bottom, 3=right, 11=top-leftcols(2)— Number of columns in legendsize(small)— Legend text sizeoff— Hide legend entirely
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig > nments/02-assignment/output/figures/german_autos_comparison.png written in PN > G format ---- CHECKPOINT: comparison graph exported ---- Saved to: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/ > assignments/02-assignment/output/figures/german_autos_comparison.png
✅ Test passed: Comparison graph exported.
Section 4: Event Window Visualization¶
Task 4.1: Define Event Date¶
What you'll do: Define the Dieselgate event date (September 18, 2015) as a local macro for use in graphs.
Why this matters: Using macros for important dates makes your code:
- Reusable: Change the date once, update all graphs
- Readable:
event_dateis clearer than "td(18sep2015)" - Standard practice: Professional Stata code uses macros for key parameters
What to expect: A local macro containing the numeric Stata date for September 18, 2015.
local varname = value // Store a value display `varname' // Use the value (note backtick)Stata date functions:
td(18sep2015)— Convert date to Stata numerictd(1aug2015)— August 1, 2015td(30nov2015)— November 30, 2015
---- CHECKPOINT: event dates defined ---- Event date: 18sep2015 Window: 01aug2015 to 30nov2015
Task 4.2: Create Event Window Plot with Vertical Line¶
What you'll do: Zoom the graph to August-November 2015 and add a vertical line at the event date.
Why this matters: Event studies require zooming to the relevant period to see:
- Pre-event stability: Normal price behavior before the shock
- Event impact: Immediate market reaction
- Post-event dynamics: Recovery or continued decline
A vertical line clearly marks when the event occurred.
What to expect: A focused plot showing the 3-month period around September 18, 2015, with a vertical line marking the EPA announcement. Use the same firm colors, legend labels, and axis labels from Section 3. Restrict the date range using the window_start and window_end local macros you defined in Task 4.1.
xline():
twoway ..., ///
xline(`event_date', lpattern(dash) lcolor(red)) ///
ttext(100 `event_date' "Event", placement(e))
Options:
xline(value)— Draw vertical line at x=valuelpattern(dash)— Dashed linelcolor(red)— Line colorif date >= ... & date <= ...— Restrict date range
---- CHECKPOINT: event window plot created ---- Graph shows Aug-Nov 2015 with event marker
Task 4.3: Export Event Window Plot¶
What you'll do: Save the event window visualization.
What to expect: event_window_zoom.png saved to your figures folder.
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig > nments/02-assignment/output/figures/event_window_zoom.png written in PNG form > at ---- CHECKPOINT: event window exported ---- Saved to: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/ > assignments/02-assignment/output/figures/event_window_zoom.png
✅ Test passed: Event window graph exported.
Section 5: Returns Distribution¶
Task 5.1: Create Histogram of Daily Returns¶
What you'll do: Create a histogram of daily log returns across all firms and time periods.
Why this matters: Histograms reveal the distribution shape:
- Normality: Are returns normally distributed? (important for statistical tests)
- Fat tails: Are extreme returns more common than normal distribution predicts?
- Skewness: Is the distribution symmetric?
What to expect: A bell-shaped distribution centered near zero, with some outliers.
histogram command creates distribution plots:
histogram varname, ///
normal ///
bin(30) ///
title("Title")
Key options:
normal— Overlay normal density curvebin(n)— Number of bins (default ~20-30)frequency— Show counts (default is density)percent— Show percentages
(bin=50, start=-.1842681, width=.00528886) ---- CHECKPOINT: returns histogram created ---- Histogram displayed with normal overlay
Task 5.2: Export Returns Histogram¶
What you'll do: Save the returns distribution histogram as returns_histogram.png in your figures folder.
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig > nments/02-assignment/output/figures/returns_histogram.png written in PNG form > at ---- CHECKPOINT: histogram exported ---- Saved to: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/ > assignments/02-assignment/output/figures/returns_histogram.png
✅ Test passed: Histogram exported.
Section 6: Firm-Specific Returns Distribution¶
Task 6.1: Compare Return Distributions by Firm¶
What you'll do: Create histograms of returns for each firm separately to compare volatility.
Why this matters: Different firms may have different volatility profiles:
- VW likely has more extreme returns (especially around September 2015)
- BMW and Mercedes may show more stable distributions
- Comparing distributions reveals these differences
What to expect: Four separate histograms (or overlaid) showing that VW has wider tails than peers.
by():
histogram ret, by(conm, note("")) normal bin(40) ///
xtitle("Daily Log Return") ytitle("Density")
This creates a panel of histograms, one for each value of conm (company name). Use bin(40) for a good level of detail, and add axis labels for clarity.
---- CHECKPOINT: firm-specific histograms created ---- Panel of 4 histograms displayed
Section 7: Cumulative Returns Plot¶
Formula: Cumulative Returnt = Σ(daily log returns from start to day t)
Why cumulative log returns?
- Additivity: Log returns sum correctly across time periods (simple returns don't)
- Comparability: Starting all firms at 0 shows relative performance clearly
- Interpretation: A cumulative return of 0.50 means ~50% total growth (approximately)
Task 7.1: Calculate Cumulative Returns¶
What you'll do: Calculate cumulative log returns for each firm, normalized to start at 0.
Why this matters: Cumulative returns show total performance over time:
- Which firm performed best/worst over the full sample?
- When did major divergences occur?
- How much wealth would an investor have gained/lost?
Formula: Cumulative return = sum of daily log returns within each firm
What to expect: A new variable cumret showing each firm's total return from start of sample.
bysort gvkey (date): gen cumret = sum(ret)This sums
ret within each gvkey, sorted by date. The cumulative sum resets for each firm.
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
cumret | 5,216 .2332495 .1720732 -.1559657 .8104563
---- CHECKPOINT: cumulative returns calculated ----
Cumret variable created for all firms
✅ Test passed: Cumulative returns calculated.
Task 7.2: Plot Cumulative Returns¶
What you'll do: Create a time series plot showing cumulative returns for all 4 firms, then save it.
Why this matters: Cumulative returns show total investment performance over time. A horizontal reference line at zero helps distinguish gains from losses.
What to expect: Four lines starting near 0 in 2013, diverging over time, with VW showing a sharp drop in September 2015. Save the graph as cumulative_returns.png in your figures folder.
Use the same firm colors (navy, maroon, forest_green, orange), legend labels (full company names), and legend positioning from Section 3. Add axis labels: "Date" for the x-axis and "Cumulative Log Return" for the y-axis. Include a horizontal dashed reference line at zero using yline(0, lpattern(dash) lcolor(gs10)).
file /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/assig > nments/02-assignment/output/figures/cumulative_returns.png written in PNG for > mat ---- CHECKPOINT: cumulative returns plot exported ---- Saved to: /Users/casparm4/Github/rsm-data-analytics-in-finance-private/private/ > assignments/02-assignment/output/figures/cumulative_returns.png
✅ Test passed: Cumulative returns plot exported.
Section 8: Publication-Quality Export¶
Task 8.1: Review Graph Export Best Practices¶
What you'll do: Review the key principles for creating publication-quality graphs.
Why this matters: Professional presentations and publications require high-quality visualizations that are:
- Readable: Clear labels, appropriate font sizes
- Scalable: High resolution for printing
- Consistent: Uniform style across all figures
- Informative: Titles and notes provide context
- Resolution: Use
width(1200)or higher (1200-1600 pixels) - Format: PNG for web/presentations, PDF/EPS for print publications
- Scheme: Use consistent scheme across all figures (
stcolorrecommended) - Labels: Always include axis labels with units
- Titles: Descriptive title + subtitle if needed
- Legend: Clear labels, positioned to not obscure data
- Notes: Data source and key details in note
Task 8.2: Verify All Figures Exported¶
What you'll do: List all files in your figures folder to confirm all 5 required graphs were created.
Expected files:
vw_price_timeseries.pnggerman_autos_comparison.pngevent_window_zoom.pngreturns_histogram.pngcumulative_returns.png
- List files in a directory (here: Figures): Use
!ls -lh "$figures"ls= "list" command that shows files and directories-l= long format (shows detailed information like permissions, size, date)-h= human-readable sizes (e.g., "1.2M" instead of "1234567")
---- CHECKPOINT: figures exported ---- All figures should appear in the list above
total 1168 -rw-r--r--@ 1 casparm4 staff 183K Feb 10 09:24 cumulative_returns.png -rw-r--r--@ 1 casparm4 staff 115K Feb 10 09:24 event_window_zoom.png -rw-r--r--@ 1 casparm4 staff 156K Feb 10 09:24 german_autos_comparison.png -rw-r--r--@ 1 casparm4 staff 47K Feb 10 09:24 returns_histogram.png -rw-r--r--@ 1 casparm4 staff 80K Feb 10 09:24 vw_price_timeseries.png
✅ Test passed: All 5 figures exported successfully.
Summary¶
Congratulations! You have successfully:
✅ Loaded and verified the cleaned panel dataset from Assignment 1
✅ Created a time series plot showing VW stock prices over 2013-2017
✅ Compared all 4 German automakers on a multi-line plot
✅ Zoomed to the event window and added a vertical reference line
✅ Examined the distribution of daily returns with histograms
✅ Compared return distributions across firms
✅ Calculated and plotted cumulative returns over time
✅ Exported all 5 graphs as publication-quality PNG files
Key Concepts Learned¶
Time Series Visualization:
- Creating line plots with
twoway line - Multi-line plots with color-coded legends
- Date range filtering with
if date >= ... & date <= ...
Event Visualization:
- Adding vertical reference lines with
xline() - Defining dates with
td()function - Using local macros for reusable parameters
Distribution Analysis:
- Histograms with
histogramcommand - Normal density overlays
- Panel histograms with
by()option
Publication Quality:
- Consistent graph schemes (
stcolor) - Descriptive titles, axis labels, and notes
- High-resolution export with
graph export - Cumulative calculations with
bysortandsum()
These visualization skills are fundamental for:
- Exploratory data analysis: Understanding patterns before modeling
- Communicating findings: Presenting results to stakeholders
- Event studies: Visualizing market reactions to corporate events
The graphs you created today will be essential for Assignment 3, where you'll use regression analysis to quantify the effects you've visualized.
References¶
Data Analytics for Finance
BM17FI · Academic Year 2025–26
Created by: Caspar David Peter
© 2026 Rotterdam School of Management