| STATE |
|---|
| CA |
| NY |
| TX |
| WA |
| FL |
| ... |
The economist’s data analysis skillset.
We cannot typically understand our data without summarizing it.
The main differentiator between a good and a bad summarization tool is whether it’s appropriate for the data.
… use the appropriate summary tool for the variable type
Q. Which state has the most locations?
| STATE |
|---|
| CA |
| NY |
| TX |
| WA |
| FL |
| ... |
> it’s difficult to see from the data alone
Q. Which state has the most locations?
> pay attention to which of these two figures is easier to answer the question
> it’s pretty easy to see that it’s CA from both of these figures
Q. Does FL or WA have more shops?
> pay attention to which of these two figures is easier to answer the question
> a bar graph is much easier to read
Q. How many shops are in FL?
> pay attention to which of these two figures is easier to answer the question
> now it takes a second to read the bar graph…
Q. How many shops are in FL?
> pay attention to which of these two figures is easier to answer the question
> we can make the bar graph easier to read by placing the number near the bar
Q. How many shops are in the state with the second most locations?
> removing clutter guides your eye to the important information
Q. How many shops are in the state with the second most locations?
> removing clutter guides your eye to the important information
Q. How many shops are in the state with the second most locations?
> states have no inherent order, but sorting can make comparisons easier
Q. How does CA compare to the whole?
> instead of a nominal categorical variable, this is binary (CA / Other)
Q. How does CA compare to the whole?
> this question is much easier to see when visualizing the two categories
> here both the pie and the bar communicte the data effectively
Q. How does CA compare to the whole?
> if the question is about percentages, a pie chart may work best
… use the right summary tool for the variable type
Every visualization follows three steps
What we just did
| Step | Action |
|---|---|
| SELECT | All coffee shops |
| TRANSFORM | Count by state |
| ENCODE | Category → position; Count → bar length |
> for categorical variables, TRANSFORM almost always means counting
Lets visualize coffee shops by state.
Coffee_Shops.csvLets visualize the main variable in each dataset.
employment_status.csvhousehold_savings.csvhousehold_incomes.csv
Summarize Coffee_Shops.csv as a nominal categorical variable.

Summarize Coffee_Shops.csv as a binary categorical variable.
Summarize employment_status.csv.

Summarize household_savings.csv.

Summarize household_incomes.csv.
