Building economic models with data.
The simplest economic model predicts employment should fall.
The simplest economic model predicts employment should fall.
Question. How do we learn about the population mean (\(\mu\)) if all we observe is \(\bar{x} = 0.59\) FTE?
The sample mean follows a normal distribution around the true mean (\(\mu\)).
The sample mean \(\bar{x}\) of (nearly) any distribution follows a normal distribution centered on the population mean \(\mu\) with standard deviation \(SE = \sigma / \sqrt{n}\).
\[\bar{x} \sim N\Big(\mu, \frac{\sigma}{\sqrt{n}}\Big)\]
We can calculate all kinds of probability if we know just \(\mu=0\) and \(\sigma = 9.8\).
Question. With \(n=331\), what’s the probability \(\bar{x}\) is within \(2 \times SE\) of \(\mu\)?
> \(P(\mu - 2 \times SE \leq \bar{x} \leq \mu + 2 \times SE) \approx 0.95\)
This interval will contain 95% of our sample means.
\(\sigma = 9.8\) and \(n=331\)
Question: what’s the probability \(\bar{x}\) is closer than \(1 \times SE\) to \(\mu\)?
100 samples, each with \(\pm SE\) CI centered on \(\mu\).
Each dot is a sample mean \(\bar{x}\) from a different sample.
> the confidence interval around \(\mu\) contains most but not all sample means \(\bar{x}\)
> about 68 out of 100 intervals capture the truth
Simulate employment changes and calculate the 68% confidence interval.
Generate some sample data.
Calculate sample statistics.
> if we took many samples, 68% of the time \(\bar{x}\) would land inside this interval
We centered the confidence interval on \(\mu\). But we don’t know \(\mu\).
The distance between \(\bar{x}\) and \(\mu\) is the same in both directions.
If \(\bar{x}\) is within \(1\times SE\) of \(\mu\), then \(\mu\) is within \(1\times SE\) of \(\bar{x}\).
> instead of asking where \(\bar{x}\) lands relative to \(\mu\)…
> we ask where \(\mu\) could be relative to our observed \(\bar{x}\)
The distance between \(\bar{x}\) and \(\mu\) is the same in both directions..
> if \(\bar{x}\) is in the \(CI\) centered on \(\mu\), then \(\mu\) is in the \(CI\) centered on \(\bar{x}\)
Card and Krueger observed \(\bar{x} = 0.59\) from one sample of 331 restaurants.
They drew one sample and centered the CI on \(\bar{x}\):
\[\bar{x} \pm 2 \times SE\]
\[= 0.59 \pm 2 \times 0.54\]
\[= [-0.49, \; 1.67]\]
Question. Is \(\mu\) inside our interval?
Is \(\mu\) inside our interval?
Our study is one of these bars. Which one is it?
Our interval either contains \(\mu\) or it doesn’t. We don’t know which.
Our “95% confidence” means that the method works 95% of the time, not that this interval has a 95% chance of containing \(\mu\).
Is \(\mu\) inside our interval?
Our study is one of these bars. Which one is it?
Our interval either contains \(\mu\) or it doesn’t. We don’t know which.
What if we don’t know \(\sigma\) either?
What if Card and Krueger had only surveyed 15 restaurants?
How close would \(S\) be to \(\sigma\) with only 15 restaurants? Let’s draw 1,000 samples.
Like \(\bar{x}\) is distributed around \(\mu\), \(S\) is distributed around \(\sigma\).
…accounts for the extra uncertainty in \(S\) around \(\sigma\).
Now we can quantify our uncertainty about the true employment effect.
The Card and Krueger (\(\bar{x} = 0.59\), \(S \approx 9.8\), \(n = 331\)):
\[95\% \text{ CI}: \quad 0.59 \pm 1.967 \times 0.54 = [-0.47, \; 1.65]\]
From estimating a range for \(\mu\) to testing a specific claim about \(\mu\).
We formalize this approach by setting up a “null hypothesis”
Null Hypothesis (\(H_0\)): The minimum wage had no effect on employment
Testing Question:
Where does our sample fall on the null distribution?
If \(\mu = 0\), the sampling distribution of \(\bar{x}\) is centered on 0.
What proportion of the null distribution is at least as extreme as our sample?
If \(\mu = 0\), the sampling distribution of \(\bar{x}\) is centered on 0.
The shaded area is the probability of getting a result as extreme as ours, if \(\mu = 0\).
How many standard errors is our sample mean from the null?
The t-statistic measures how many standard errors our sample mean is from the null value:
\[t = \frac{\bar{x} - \mu_0}{S/\sqrt{n}}\]
\[t = \frac{0.59 - 0}{9.8/\sqrt{331}} = \frac{0.59}{0.54} = 1.09\]
Where:
A caution about hypothesis testing
Statistical significance:
Practical significance:
> always consider the magnitude of the effect, not just the p-value
The hypothesis testing framework is powerful and general.
> the tools we built today apply to every model we’ll see in Parts 4 and 5