The economist’s data analysis skillset.
Has American politics become more geographically divided along economic lines?
Question: Has the relationship between income and voting changed?
Two datasets, each answering different questions
Dataset 1: Income by County
Dataset 2: Elections by County
Compare county distributions between 2000 and 2024.
This is a Part 2.2 (or 1.4) question: a numerical variable across categories.
Counties shifted Republican on average
> median dropped from 41% to 29% — but why were elections still close?
Lets try to explaining the paradox.
Question: Do larger counties lean Democratic?
This is a Part 2.1 question — a relationship between two numerical variables.
Large counties vote more Democratic
> Democratic-leaning counties are larger (correlation: 0.49)
Large counties tend to be richer. Is income related to voting?
Question: Is the Democratic lean related to income?
Richer counties vote more Democratic
> correlation: 0.25 — but was it always this way?
Have higher income counties always leaned Democratic?
Prior tools fall short:
> neither tool shows how the relationship between two numerical variables varies by category
Color points by year to see how the relationship differs
We need a scatter plot that shows bivariate relationships separately by category.
The relationship has flipped
> in 2000, richer counties leaned Republican; by 2024, they lean Democratic
Each question built on the previous
Combining income and election data
To answer Q2-Q4, we needed data from multiple sources.
> the merge matches 3,106 of ~3,200 counties — 97% success
A template for final projects
What this unit adds to your toolkit
| Block | Part 2.3 |
|---|---|
| Variables | Numerical + Categorical |
| Structures | Cross-section, Panel |
| Operations | Merge, Reshape (melt), Add hue |
| Visualizations | Scatter by category (lmplot with hue) |