The economist’s data analysis pipeline.
…three main relationships between data points.
The most effective summarization tool depends on the relationship between the data points.
| Cross-Sectional | Time-Series | Panel Data | |
|---|---|---|---|
| Focus | Multiple units, one time point | One unit, many times | Multiple units, many time points |
| Shape | Wide format | Long format | Long format |
| Ex. | Household income, 2025 | US GDP, 10 years | Household income, 10 years |
> we’ve spent the first part of the class on cross-sectional data
> we’ll spend a bit of time on panel and geographic data later
Lets identify the variable type for each dataset.
household_incomes.csvhousehold_savings.csvMonthly_Coffee_Prices.csvWhat information should we use to set prices in January 2026?
What information should we use to set prices in January 2026?
> it’s difficult to know… do we choose the mode?
> lets just plot the price against time
What information should we use to set prices in January 2026?
> lets indicate with a line that these points are in squence
What information should we use to set prices in January 2026?
Do you notice a trend in price?
> there was a positive trend in 2021
> we can zoom out to get a bigger picture
Do you notice a trend in price?
> how have prices changed since 2000?
> prices have increased somewhat, with many periods of decrease
What information should we use to set prices in January 2026?
> with background shading its easier to see periods with a negative trend in price
Lets use a linegraph to examine the trends in coffee prices.
Coffee_Prices.csv
What information should we use to set prices in January 2026?
> could there be seasonal trends within the larger trend?
What information should we use to set prices in January 2026?
> a boxplot gives us a picture of the prices just in January
> lets compare this to other months
In addition to the overall trend, are there monthly patterns?
> lets be more specific…
In which month was the record highest price set?
In which month was the record highest price set?
> look at the maximums
In which month was the record highest price set?
In which season are prices most spread out?
In which season are prices most spread out?
> look at the ranges
In which season are prices most spread out?
What is the trend in median price?
> look at the medians…
What is the trend in median price?
> look at the medians… pretty difficult to see
What is the trend in median price?
What is the trend in median price?
What is the difference between the largest and the smallest median price per pound?
> something like $1.30 - $1.21 = $0.09
Linegraphs show trends; multi-boxplots show between-period patterns.
Lets use a multi-boxplot to examine the seasonal patterns of coffee prices.
Coffee_Prices.csv