| Hours | Shop | |
|---|---|---|
| transaction_id | ||
| 7 | 12 | Shop A |
| 11 | 15 | Shop A |
| 19 | 14 | Shop A |
| 32 | 16 | Shop A |
| 33 | 19 | Shop A |
The economist’s data analysis pipeline.
Panel data comes in one of two formats.
Panel Data in Long Format uses lists each entry as a row, using a column (eg. Shop) to record the group.
| Hours | Shop | |
|---|---|---|
| transaction_id | ||
| 7 | 12 | Shop A |
| 11 | 15 | Shop A |
| 19 | 14 | Shop A |
| 32 | 16 | Shop A |
| 33 | 19 | Shop A |
Panel data comes in one of two formats.
Panel Data in Wide Format uses lists each group as a row, using a column to record each entry.
| 1999 | 2019 | |
|---|---|---|
| Code | ||
| AUT | 8.430589 | 7.925747 |
| BGR | 2.652661 | 3.638313 |
| HRV | 4.480790 | 5.623266 |
| CYP | 3.477888 | 5.615070 |
| CZE | 3.255587 | 4.739563 |
Use Coffee_Sales_Receips.csv to help inform where to hire a barista.
Q. Which coffee shop is the busiest?
Q. Which coffee shop is the busiest?
> a bar chart makes it easy to compare shops’ busyness
Q. What time of day is the busiest?
Q. What time of day is the busiest?
> a histogram makes it easy to compare transactions by time of day
> does this mean the morning shift at Shop A is the busiest?
Q. Which shift is the busiest?
Q. Which shift is the busiest?
> an overlaid histogram can show all three groups
> does this show the data clearly?
Q. Which shift is the busiest?
> instead, lets use a line graph
Use Coffee_Sales_Receips.csv to help inform where to hire a barista.
| Hours | Shop | |
|---|---|---|
| 0 | 12 | Shop A |
| 1 | 15 | Shop A |
| 2 | 14 | Shop A |
| 3 | 16 | Shop A |
| 4 | 19 | Shop A |
> you’ll see a few more columns in your dataset
> this is Long-Format Panel Data: transactions are all in the same column
Use Coffee_Sales_Receips.csv to help inform where to hire a barista.

Use Coffee_Sales_Receips.csv to help inform where to hire a barista.

Use Coffee_Sales_Receips.csv to help inform where to hire a barista.

Q. Which shift is the busiest?
# Create hourly counts by shop
hourly_counts = sales.groupby(['Shop', 'Hours']).size().reset_index(name='Count')| Shop | Hours | Count | |
|---|---|---|---|
| 0 | Shop A | 7 | 1383 |
| 1 | Shop A | 8 | 1632 |
| 2 | Shop A | 9 | 1693 |
| 3 | Shop A | 10 | 1711 |
| 4 | Shop A | 11 | 1136 |
Q. Which shift is the busiest?
# Multiple-Line Graph
sns.histplot(sales, x='Hours', hue='Shop', bins=range(0,24,1), element='poly')
Q. Which shift is the busiest?
# Multiple-Line Graph
sns.histplot(sales, x='Hours', hue='Shop', bins=range(0,24,1), element='poly', fill=False)
Is the world drinking more coffee?
Lets examine whether the world is drinking more coffee today than in the 1990s.
Coffee_Per_Cap.csvIs the world drinking more coffee?
> compared to what…?
Is the world drinking more coffee?
> this is still pretty unclear: histograms aren’t great for comparison
> lets use a multi-boxplot
Is the world drinking more coffee?
> this is better: it looks like the distribution is shifted higher!
> lets examine the years in between to see how the distribution evolved
Is the world drinking more coffee?
> lets ask some smaller more focussed questions
Which years show at least half consuming less than 5 kg per cap?
Which years show at least half consuming less than 5 kg per cap?
> focus on the medians
Which years show at least half consuming less than 5 kg per cap?
> … when the median is above 5 kg per cap
Which years saw the largest jump in the median?
> … a little difficult to see
Which years saw the largest jump in the median?
> … a little difficult to see
Is the country with the lowest consumption consuming more today?
Is the country with the lowest consumption consuming more today?
> focus on the minimums
> yes!
What patterns do we observe about the maximums?
> same with the maximums
Which years did more than 25% consume less than 5 kg?
Which years did more than 25% consume less than 5 kg?
> look at the 25%
Which years did more than 25% consume less than 5 kg?
> look at the 25% and compare it to 5 kg per cap
Which years did more than 25% consume less than 5 kg?
> all of them
Which year saw the greatest difference between any two countries?
> look at the range
Which year saw the greatest difference between any two countries?
> look at the range
Which year saw the greatest difference between any two countries?
> look at the range and select the largest
Is the world drinking more coffee?
We’re going to use a set of boxplots to visually compare across years the distributions of coffee consumption per capital among coffee importing countries.
Coffee_Per_Cap.csv| Code | 1999 | 2009 | 2019 | |
|---|---|---|---|---|
| 0 | AUT | 8.430589 | 6.371562 | 7.925747 |
| 2 | BGR | 2.652661 | 3.296419 | 3.638313 |
| 3 | HRV | 4.480790 | 5.100831 | 5.623266 |
| 4 | CYP | 3.477888 | 4.050500 | 5.615070 |
| 5 | CZE | 3.255587 | 3.016104 | 4.739563 |
> this is Wide-Format Panel Data: each year is in a separate column
Is the world drinking more coffee?
With wide-format panel data seaborn looks a little different.
# Wide Format Multi-Boxplot
sns.boxplot(percap[['1999','2004','2009','2014','2019']], orient='h', whis=(0, 100))
In which year did most countries increase their coffee consumption?
> not visible in the figure!
How many countries increased their coffee consumption between 1999 and 2019?
> also not visible with this figure!
How many countries increased their coffee consumption between 1999 and 2019?
> better, but this figure still doesn’t let us keep track of countries between years…
How many countries increased their coffee consumption between 1999 and 2019?
How many countries increased their coffee consumption between 1999 and 2019?
> a scatter plot can visualize changes between two points in time
How many countries increased their coffee consumption between 1999 and 2019?
> a 45 degree line shows all the possible points with no change
How many countries increased their coffee consumption between 1999 and 2019?
> points above the line increased
How many countries decreased their coffee consumption between 1999 and 2019?
> points below the line decreased
Does the data confirm that the world is drinking more coffee?
> we can use colors to visualize both increases and decreases
Is the world drinking more coffee?
We’re going to use a scatterplot to visually examine how countries’ coffee consumption changed between 1999 and 2019.
Coffee_Per_Cap.csvIs the world drinking more coffee?
Is the world drinking more coffee?