The economist’s data analysis pipeline.
… use the appropriate summary tool for the variable type
Q. Which age group has the most Starbucks customers?
> the bin sizes aren’t even, making it hard to interpret
Q. Which age group has the most Starbucks customers?
> the bin sizes aren’t even, making it hard to interpret
Q. Which age group has the most Starbucks customers?
> but what if we want to distinguish between a 55 year old and a 60 year old?
Q. Which age group has the most Starbucks customers?
> what if we take this even further?
> what if we compare 44 year olds to 45 year olds?
Q. Do 44 or 45 year olds spend more at Starbucks?
> we can go too far, introducing statistical noise. how do we fix the problem?
> increase the sample size or the bin width!
Q. Which age group has the most Starbucks customers?
> larger sample has less noise!
Q. Which age group has the most Starbucks customers?
> larger bins also has less noise!
… use the right summary tool for the variable type
Q. Which age group among those making $40k or less has the most Starbucks customers?
Lets use the data to examine whether customers between 45 - 55 years old spend the most among customers making less than $40k.
Data: Starbucks_Customer_Profiles_40k.csv
Q. Which age group among those making $40k or less has the most Starbucks customers?

Q. Which countries drank an average amount of coffee?
> histogram bins make it impossible to see the exact values
Q. Which countries drank the most coffee in 1999?
> again, histograms make it difficult to see statistical measures
Q. Which countries drank the most coffee in 1999?
> as we’ll see, boxplots can tell us about quartiles
> but boxplots are still pretty unclear for our question
Q. Which countries drank the most coffee in 1999?
> here we can see the datapoints directly with the boxplot
> each point represents a country’s coffee consumption
Q. Which countries drank the most coffee in 1999?
> each element of the boxplot represents one of these five quartiles
Which countries consumed more than 8 kg per capita?
Which countries consumed more than 8 kg per capita?
> we can highlight the relevant subsets of the data
Which country consumed the most coffee per capita?
> we can find the exact values according to quartiles
Which country consumed the most coffee per capita?
> we can find the exact values according to quartiles
> Finland consumed the most coffee per capita in 1999
Which country consumed the least coffee per capita?
Which country consumed the least coffee per capita?
> Russia consumed the least coffee per capita in 1999
How about the median?
How about the median?
> the US!
Which country consumes more than exactly 25% of countries?
Which country consumes more than exactly 25% of countries?
> Slovakia!
Which country consumes more than exactly 75% of countries?
Which country consumes more than exactly 75% of countries?
> Netherlands
Boxplots show quartiles; stripplots show the data.
Boxplots show quartiles; stripplots show the data.
Show the distribution of coffee consumption per capita in 2019.
Lets use a boxplot and stripplot to examine the distribution of coffee consumption per capita among coffee-importing countries in 2019.
Coffee_Per_Cap_2019.csvShow the distribution of coffee consumption per capita in 2019.
