ECON 0150 | Economic Data Analysis

The economist’s data analysis skillset.


Part 1.5 | Panel Data (Wide Format)

Recap Part 1.4 | Long Format Panel Data

Each row represents an observation of an entity at one point in time


Code Year Consumption
AUT 1990 10.47
AUT 1991 10.07
AUT 1992 9.27
AUT 1993 10.13
AUT 1994 8.21

> this is data on coffee consumption per capita, 34 countries, 1990 to 2019

Question

Are countries drinking more coffee in 2019 than in 1999?


Code Year Consumption
AUT 1990 10.47
AUT 1991 10.07
AUT 1992 9.27
AUT 1993 10.13
AUT 1994 8.21

> how can we use this data to answer our question?

> it’s challenging… the information we need is spread across different rows

Panel Data: All-Country Line Plot

Are countries drinking more coffee?

> readable, but not great for answering our question

Panel Data: Coffee Consumption Per Capita

Is the world drinking more coffee?

> compared to what…?

Panel Data: Coffee Consumption Per Capita

Is the world drinking more coffee?

> this is still pretty unclear: histograms aren’t great for comparison

> lets use a multi-boxplot

Panel Data: Multi-Boxplots

Is the world drinking more coffee?

> this is better: it looks like the distribution is shifted higher!

> lets examine the years in between to see how the distribution evolved

Panel Data: Multi-Boxplots

Is the world drinking more coffee?

> lets ask some smaller more focussed questions

Panel Data: Multi-Boxplots

Which years show at least half consuming less than 5 kg per cap?

Panel Data: Multi-Boxplots

Which years show at least half consuming less than 5 kg per cap?

> focus on the medians

Panel Data: Multi-Boxplots

Which years show at least half consuming less than 5 kg per cap?

> … when the median is above 5 kg per cap

Panel Data: Multi-Boxplots

Which years saw the largest jump in the median?

> … a little difficult to see

Panel Data: Multi-Boxplots

Which years saw the largest jump in the median?

> … a little difficult to see

Panel Data: Multi-Boxplots

Is the country with the lowest consumption consuming more today?

Panel Data: Multi-Boxplots

Is the country with the lowest consumption consuming more today?

> focus on the minimums

> yes!

Panel Data: Multi-Boxplots

What patterns do we observe about the maximums?

> same with the maximums

Panel Data: Multi-Boxplots

Which years did more than 25% consume less than 5 kg?

Panel Data: Multi-Boxplots

Which years did more than 25% consume less than 5 kg?

> look at the 25%

Panel Data: Multi-Boxplots

Which years did more than 25% consume less than 5 kg?

> look at the 25% and compare it to 5 kg per cap

Panel Data: Multi-Boxplots

Which years did more than 25% consume less than 5 kg?

> all of them

Panel Data: Multi-Boxplots

Which year saw the greatest difference between any two countries?

> look at the range

Panel Data: Multi-Boxplots

Which year saw the greatest difference between any two countries?

> look at the range

Panel Data: Multi-Boxplots

Which year saw the greatest difference between any two countries?

> look at the range and select the largest

Panel Data: Multi-Boxplots

In which year did most countries increase their coffee consumption?

> not visible in the figure!

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> also not visible with this figure!

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> better, but this figure still doesn’t let us keep track of countries between years…

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> we need a way to compare each country’s consumption in 1999 vs 2019

> but in long format, this is hard — the data is spread across different rows

Wide Format Panel Data

What if each year was its own column?


Code 1999 2004 2009 2014 2019
AUT 8.43 7.31 6.37 7.97 7.93
BGR 2.65 2.83 3.30 3.12 3.64
HRV 4.48 5.16 5.10 5.21 5.62
CYP 3.48 3.53 4.05 4.13 5.62
CZE 3.26 3.56 3.02 5.69 4.74

> now comparing 1999 vs 2019 is just comparing two columns!

Wide Format

Each year is a column

Long Format

Each observation is a row

Two Formats, Same Data

Panel data can be stored in two ways


  • Wide format: Each time period is a separate column
  • Long format: Each observation is a separate row
  • Same information, different shapes
  • Different shapes make different tasks easier

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> a scatter plot can visualize changes between two points in time

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> a 45 degree line shows all the possible points with no change

Panel Data: Relationships Between Years

How many countries increased their coffee consumption between 1999 and 2019?

> points above the line increased

Panel Data: Relationships Between Years

How many countries decreased their coffee consumption between 1999 and 2019?

> points below the line decreased

Panel Data: Relationships Between Years

Does the data confirm that the world is drinking more coffee?

> we can use colors to visualize both increases and decreases

Exercise 1.5 | Scatterplots

Is the world drinking more coffee?

We’re going to use a scatterplot to visually examine how countries’ coffee consumption changed between 1999 and 2019.

  • Data: Coffee_Per_Cap.csv

Exercise 1.5 | Scatterplots

Is the world drinking more coffee?

# Wide Format Scatterplot
sns.scatterplot(percap, x='1999', y='2019')

Exercise 1.5 | Scatterplots

Is the world drinking more coffee?

# Wide Format Scatterplot
sns.scatterplot(percap, x='1999', y='2019')

Counting Changes

How many countries increased vs decreased?

> we can see visually that most points are above the 45° line

> but how do we count exactly how many?

Counting Changes

How many countries increased vs decreased?

  1. Create a change column: Subtract old from new
Code 1999 2019 change
AUT 8.43 7.93 -0.50
BGR 2.65 3.64 0.99
HRV 4.48 5.62 1.14
  1. Filter: Select rows where change > 0 (or < 0)
  2. Count: Use len() to count the filtered rows

> positive change = increased; negative change = decreased

Exercise 1.5 | Counting Changes

How many countries increased their coffee consumption?

# Create a change column
percap['change'] = percap['2019'] - percap['1999']
percap.head()


# Count countries that increased (change > 0)
increased = percap[percap['change'] > 0]
len(increased)


# Count countries that decreased (change < 0)
decreased = percap[percap['change'] < 0]
len(decreased)

Part 1.5 | Panel Data (Wide Format)



Summary

  • Wide format: Each time period is a column
  • Multi-boxplots compare distributions across time
  • Scatterplots with 45° lines track individual changes
  • Filtering with df[df['col'] > 0] counts subsets

Building Blocks

What this unit adds to your toolkit

Block Part 1.5
Variables Numerical
Structures Panel (wide format)
Operations Filter
Visualizations Multi-boxplot, Scatterplot with 45° line

Part 1 Building Blocks

How your toolkit grew

Block 1.1 1.2 1.3 1.4 1.5
Variables Categorical + Numerical
Structures Cross-section + Timeseries + Panel (long) + Panel (wide)
Operations Count + Bin, Mean, SD, Quartiles + Real price transform + Groupby + Filter
Visualizations Bar, Pie + Histogram, Boxplot + Line plot, Multi-boxplot + Multi-line, Facets + Scatterplot w/ 45°

Part 1 Toolkit

Everything you know

Block Part 1
Variables Categorical (binary, nominal, ordinal), Numerical
Structures Cross-section, Timeseries, Panel (long & wide)
Operations Count, Bin, Mean, SD, Quartiles, Real price transform, Groupby, Filter
Visualizations Bar chart, Pie chart, Histogram, Boxplot, Stripplot, Line plot, Multi-boxplot, Multi-line, Facets, Scatterplot w/ 45° line