
The economist’s data analysis skillset.
Lets use the general linear model to test for differences in wages by gender.
Questions:
The simplest model with just a gender indicator.

The simplest model with just a gender indicator.

\[\text{Wage} = \beta_0 + \beta_1 \times \text{Male} + \varepsilon\]
The simplest model with just a gender indicator.
\[\text{Wage} = \beta_0 + \beta_1 \times \text{Male} + \varepsilon\]
Implementing the basic gender gap model
import statsmodels.formula.api as smf
# Fit the model with just the male indicator
model1 = smf.ols('INCLOG10 ~ MALE', data=data).fit()
print(model1.summary().tables[1])Adding education as a control variable.

Adding education as a control variable.

Adding education as a control variable.

\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Male} + \varepsilon\]
Adding education as a control variable.
\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Male} + \varepsilon\]
> β₀ is the base wage for those with no post-middle school education
> β₂ represents the gender wage gap - added to the intercept for males only
> model assumes parallel lines - same returns to education (β₁) for everyone
Implementing the gender fixed effect model
import statsmodels.formula.api as smf
# Fit the model with male indicator
model2 = smf.ols('INCLOG10 ~ EDU + MALE', data=data).fit()
print(model2.summary().tables[1])The economist’s data analysis skillset.
What if education benefits genders differently?

What if education benefits genders differently?

\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Education} \times \text{Male} + \varepsilon\]
What if education benefits genders differently?
\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Education} \times \text{Male} + \varepsilon\]
Implementing the education-gender interaction model
# Fit model with interaction between education and sex
model3 = smf.ols('INCLOG10 ~ EDU + EDU:MALE', data=data).fit()
print(model3.summary().tables[1])Combining fixed effects and interactions

\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Male} + \beta_3 \times \text{Education} \times \text{Male} + \varepsilon\]
Combining fixed effects and interactions
\[\text{Wage} = \beta_0 + \beta_1 \times \text{Education} + \beta_2 \times \text{Male} + \beta_3 \times \text{Education} \times \text{Male} + \varepsilon\]
Implementing the full gender difference model
# Fit full model with both sex indicator and interaction
model4 = smf.ols('INCLOG10 ~ EDU + MALE + EDU:MALE', data=data).fit()
print(model4.summary().tables[1])> allows for differences in both baseline wages and educational returns
Different models answer different questions
Model 1: Fixed Effect
Model 2: Fixed Effect with Control
Model 3: Interaction Only
Model 4: Fixed Effect and Interaction
General linear model for analyzing group differences
Part 5.1 | Categorical Controls (‘Fixed Effects’)
Part 5.2 | Interactions
Model Choice should be guided by your research question