
The economist’s data analysis stillset.
Do people wait longer later in the day?

Do people wait longer later in the day?

> but in general we don’t ask many questions about vertical incercepts
Do people wait longer later in the day?

Lets compare two models.
Do people wait longer later in the day?

> a slope (β₁) improves model fit (MSE; ‘wrongness’) when there’s a relationship
> the intercept is no longer the mean
Which model minimizes the models’ ‘wrongness’ (Mean Squared Error)?

> Model C minimizes MSE!
GLM selects the \(\beta_1\) with the smallest MSE.

> this slope (β₁) gives the best guess of the relationship between x and y
> but what if the true slope is zero … could this slope be just sampling error?
Like before, if we take many samples, we get slighly different slopes and slighly different fits.

The slope coefficient follows a normal distribution centered on the population slope.

> the slopes follow a normal distribution around the population relationship!
> this lets us perform a t-test on the slope!
The slope coefficient follows a normal distribution centered on the population slope.

> we don’t know the entire distribution, just our sample slope
The slope coefficient follows a normal distribution centered on the population slope.

> center the distribution on our null
> check the distance from the sample
The slope coefficient follows a normal distribution centered on the population slope.

> the p-value is the probability of something as far from the null as our sample
The slope coefficient follows a normal distribution centered on the population slope.

> p-value: the ‘surprisingness’ of our sample if \(\beta_1 = 0\)
> the probability of seeing our sample by chance if there is no relationship
> a small p-value is evidence against the null hypothesis (\(\beta_1 = 0\))
Many possible models we might observe by chance if the null (\(\beta_1 = 0\)) were true.

> how likely does it look like this slope was drawn from the null slopes?
> p-value: the probability a slope as extreme as ours under the null (\(\beta_1=0\))
Are wealtheir countries happier?
What wait time should we expect at 100 minutes after open?

What wait time should we expect at 100 minutes after open?

What wait time should we expect at 100 minutes after open?

> you can find this with a calculator!
> plug \(x=100\) into the equation \(y = 4.31 + 0.011 x\)
What wait time should we expect at 200 minutes after open?

What wait time should we expect at 200 minutes after open?

Are wealtheir countries happier?
How much does wait time increase every minute after open?

> \(\beta_1\) tells us how much \(y\) increases with every 1 unit increase in \(x\)
How much does happiness increase for each additional $1,000 of per capita GDP?
GLM performs a t-test on all model coefficients.
Univariate (Part 3): \(y = \beta_0 + \epsilon\)
Numerical Predictor: \(y = \beta_0 + \beta_1 x + \epsilon\)
GLM uses the idea of a t-test with any coefficient.
Categorical Predictor (next time): \(y = \beta_0 + \beta_1 x + \epsilon\)
Multivariate GLM (Part 5):
GLM is the workhorse statistical tool in empirical economics.
Labor Economics: relationship between education and wages.
\[\text{wage} = \beta_0 + \beta_1 \text{education} + \varepsilon\]
Policy Analysis: relationship between minimum wages and employment.
\[\text{employment} = \beta_0 + \beta_1 \text{minimum_wage} + \varepsilon\]
Political Economy: relationship between neighbor’s party and voter turnout
\[\text{voted} = \beta_0 + \beta_1 \text{neighborhood_politics} + \varepsilon\]
Summary
GLM Framework:
Numerical Predictors:
Same Distribution:
Same Interpretation:
Extending the GLM framework
Next Up:
Later:
> all built on the same statistical foundation