Building economic models with data.
The General Linear Model just draws lines through data points.
We just developed the simplest GLM!
The General Linear Model just draws lines through data points.
What is a GLM?
Linear Model Equation: \(y = mx + b\)
The General Linear Model just draws lines through data points.
How do we choose the line?
Mean Squared Error: \(MSE = \frac{1}{n} \sum_i \epsilon_i^2\)
A model with no x ( basically: x=0 ).
The simplest GLM is using only an intercept term: \(y=b\).
> what should we choose for b to minimize the model’s error?
Open Exercise 3.4 and complete Questions 1-2.
Load the data and compute summary statistics.
> are the values centered near zero?
Complete Questions 3-5. Which value of \(b\) minimizes the model’s error?
Compute the MSE for \(b = 2.5\), \(b = 1.5\), and \(b = \bar{y}\).
> which value of \(b\) gives the smallest MSE?
The sample mean minimizes the MSE.
We minimize the MSE by choosing \(b\) to be equal to the sample mean \(\bar{y}\).
> when we’ve minimized MSE, it’s equal to the Variance!
Like before, if we take many samples, we get slightly different means and slightly different fits.
The intercept terms follow a t-distribution centered on the true mean.
> we only observe one sample mean, so we center the distribution there
We center the sampling distribution on our observed sample mean.
> what is the probability of seeing this if the average wait time is 1.8 minutes?
The probability of something as extreme as our sample mean given the null.
> here we’re centering the t-distribution on the observed sample mean
> as before, this is mathematically equivalent to centering on the null
The probability of something as extreme as our sample mean given the null.
The probability of something as extreme as our sample mean given the null.
The probability of something as extreme as our sample mean given the null.
A t-test is a linear model with only an intercept: \(y = \beta_0 + \epsilon\)
> the sample mean \(\beta_0\) minimizes the sum of squared errors
> the p-value tells us the probability of the data given the default null
> the best guess of the true mean is \(\beta_0\)
> this is the simplest version of an OLS regression model
Complete Questions 6-7. Fit the intercept-only model in Python.
Fit the model \(y \sim 1\) and print the coefficient table.
Compare the intercept coefficient to the sample mean from Question 1.
> the intercept equals the sample mean — the model’s best prediction is \(\bar{y}\)
> what null hypothesis does the p-value test?
Bivariate General Linear Model
In Part 4 we will explore:
> all built on the same statistical foundation we explored today