The economist’s data analysis pipeline.
We cannot typically understand our data without summarizing it.
The main differentiators between a good and bad summarization tools is the dimension of the data and what you’re trying to understand about the data.
Data comes in all shapes, sizes, and types.
Variable Type
Data Structure
Number of Variables
… data that’s best recorded in categories
Binary: only two categories
Nominal: categories cannot be ordered / ranked
Ordinal: categories have order / rank but not a meaningful scale
… data that’s best recorded in numerical form
Discrete: countable numbers with meaningful intervals
Continuous: quantities measurable on the reals
Lets identify the variable type for each dataset.
employment_status.csvemployment_sector.csveconomic_optimism.csvhousehold_size.csvhousehold_income.csv
What type of variables are contained in employment_status.csv?
Binary Categorical: two categories (Employed, Unemployed)

What type of variables are contained in employment_sector.csv?
Nominal: no inherent order (Agriculture, Services, Unemployed)

What type of variables are contained in economic_optimism.csv?
Ordinal: meaningful order without meaningful intervals (Very Pessimistic to Very Optimistic)

What type of variables are contained in household_size.csv?
Discrete Numerical: countable numbers with meaningful intervals (Number of Children in a Household)

What type of variables are contained in household_income.csv?
Continuous Numerical: quantities measurable on the reals (Income in USD)
