The economist’s data analysis pipeline.
… use the appropriate summary tool for the variable type
Q. Which state has the most locations?
> pay attention to which of these two figures is easier to answer the question
> it’s pretty easy to see that it’s CA from both of these figures
Q. Does FL or WA have more shops?
> pay attention to which of these two figures is easier to answer the question
> a bar graph is much easier to read
Q. How many shops are in FL?
> pay attention to which of these two figures is easier to answer the question
> now it takes a second to read the bar graph…
Q. How many shops are in FL?
> pay attention to which of these two figures is easier to answer the question
> we can make the bar graph easier to read by placing the number near the bar
Q. How many shops are in the state with the second most locations?
> removing clutter guides your eye to the important information
Q. How many shops are in the state with the second most locations?
> removing clutter guides your eye to the important information
Q. How many shops are in the state with the second most locations?
> states have no inherent order, but sorting can make comparisons easier
Q. How does CA compare to the whole?
> instead of a nominal categorical variable, this is binary (CA / Other)
Q. How does CA compare to the whole?
> this question is much easier to see when visualizing the two categories
> here both the pie and the bar communicte the data effectively
Q. How does CA compare to the whole?
> if the question is about percentages, a pie chart may work best
… use the right summary tool for the variable type
Lets visualize coffee shops by state.
Coffee_Shops.csvLets visualize the main variable in each dataset.
employment_status.csvhousehold_savings.csvhousehold_incomes.csv
Summarize Coffee_Shops.csv as a nominal categorical variable.

Summarize Coffee_Shops.csv as a binary categorical variable.
Summarize Coffee_Shops.csv as a binary categorical variable.
# Create a binary categorical variable
shops['CA'] = np.where(shops['STATE'] == 'CA', 'CA', 'Other')
Summarize employment_status.csv.

Summarize household_savings.csv.

Summarize household_incomes.csv.
