ECON 0150 | Economic Data Analysis

Building economic models with data.


Part 3.3 | Quantifying Uncertainty

New Jersey Minimum Wage Study

The simplest economic model predicts employment should fall.


  • NJ raised its minimum wage in 1992.
  • Card and Krueger measured how 331 NJ restaurants’ employment changed.


  • Employment increased by 0.59 FTE workers per store on average.
  • This is the sample mean (\(\bar{x}\)) from \(n=331\) restaurants.


New Jersey Minimum Wage Study

The simplest economic model predicts employment should fall.




Question. How do we learn about the population mean (\(\mu\)) if all we observe is \(\bar{x} = 0.59\) FTE?

Central Limit Theorem: Refresher

The sample mean follows a normal distribution around the true mean (\(\mu\)).


The sample mean \(\bar{x}\) of (nearly) any distribution follows a normal distribution centered on the population mean \(\mu\) with standard deviation \(SE = \sigma / \sqrt{n}\).

\[\bar{x} \sim N\Big(\mu, \frac{\sigma}{\sqrt{n}}\Big)\]

  • We don’t need to know the shape of the distribution to know the shape of the sampling distribution.
  • The sampling distribution (what we draw our sample mean from) is centered on the population mean.
  • The variability in sample means gets smaller as the sample size grows.

Confidence Intervals: Known \(\mu\) and \(\sigma\)

We can calculate all kinds of probability if we know just \(\mu=0\) and \(\sigma = 9.8\).


Question. With \(n=331\), what’s the probability \(\bar{x}\) is within \(2 \times SE\) of \(\mu\)?

> \(P(\mu - 2 \times SE \leq \bar{x} \leq \mu + 2 \times SE) \approx 0.95\)


This interval will contain 95% of our sample means.

Exercise 3.3 | Confidence Intervals: Known \(\mu\) & \(\sigma\)

\(\sigma = 9.8\) and \(n=331\)

Question: what’s the probability \(\bar{x}\) is closer than \(1 \times SE\) to \(\mu\)?

se = sigma / np.sqrt(n)
stats.norm.cdf(mu + 1*se, loc=mu, scale=se) - stats.norm.cdf(mu - 1*se, loc=mu, scale=se)

Confidence Intervals are About Probability

100 samples, each with \(\pm SE\) CI centered on \(\mu\).

Each dot is a sample mean \(\bar{x}\) from a different sample.

> the confidence interval around \(\mu\) contains most but not all sample means \(\bar{x}\)

> about 68 out of 100 intervals capture the truth

Exercise 3.3 | Simulating Confidence Intervals

Simulate employment changes and calculate the 68% confidence interval.

Generate some sample data.

sample = np.random.normal(0, 9.8, 331)

Calculate sample statistics.

x_bar = np.mean(sample)
mu = 0
se = 9.8 / np.sqrt(331)
mu - se, mu + se 
(-0.5386567157307265, 0.5386567157307265)

> if we took many samples, 68% of the time \(\bar{x}\) would land inside this interval

The Problem

We centered the confidence interval on \(\mu\). But we don’t know \(\mu\).



  • If we knew the true effect of the minimum wage, we wouldn’t need the data
  • So how is a confidence interval centered on \(\mu\) useful?

The Centerpoint Flip

The distance between \(\bar{x}\) and \(\mu\) is the same in both directions.


If \(\bar{x}\) is within \(1\times SE\) of \(\mu\), then \(\mu\) is within \(1\times SE\) of \(\bar{x}\).


> instead of asking where \(\bar{x}\) lands relative to \(\mu\)

> we ask where \(\mu\) could be relative to our observed \(\bar{x}\)

The Centerpoint Flip

The distance between \(\bar{x}\) and \(\mu\) is the same in both directions..

> if \(\bar{x}\) is in the \(CI\) centered on \(\mu\), then \(\mu\) is in the \(CI\) centered on \(\bar{x}\)

New Jersey: 95% Confidence Interval

Card and Krueger observed \(\bar{x} = 0.59\) from one sample of 331 restaurants.

They drew one sample and centered the CI on \(\bar{x}\):

\[\bar{x} \pm 2 \times SE\]

\[= 0.59 \pm 2 \times 0.54\]

\[= [-0.49, \; 1.67]\]


Question. Is \(\mu\) inside our interval?

Confidence Intervals: Interpretation

Is \(\mu\) inside our interval?

Our study is one of these bars. Which one is it?

Our interval either contains \(\mu\) or it doesn’t. We don’t know which.

Our “95% confidence” means that the method works 95% of the time, not that this interval has a 95% chance of containing \(\mu\).

Confidence Intervals: Interpretation

Is \(\mu\) inside our interval?

Our study is one of these bars. Which one is it?

Our interval either contains \(\mu\) or it doesn’t. We don’t know which.



Our confidence is in the method, not the specifc interval we estimated!

Confidence Intervals: Unknown \(\sigma\)

What if we don’t know \(\sigma\) either?


  • We used \(\bar{x}\) to estimate \(\mu\). Can we use \(S\) to estimate \(\sigma\)?
  • Yes, but \(S\) is itself a random variable, varying from sample to sample.
  • Replacing \(\sigma\) with \(S\) introduces a new source of uncertainty.

Sampling Variability: \(S\)

What if Card and Krueger had only surveyed 15 restaurants?

How close would \(S\) be to \(\sigma\) with only 15 restaurants? Let’s draw 1,000 samples.


Like \(\bar{x}\) is distributed around \(\mu\), \(S\) is distributed around \(\sigma\).

The t-Distribution

…accounts for the extra uncertainty in \(S\) around \(\sigma\).



  • The t-Dist accounts for the extra uncertainty when we use \(S\) instead of \(\sigma\).
  • When \(n\) is large, \(S\) is close to \(\sigma\) and the t-Dist approaches Normal.

Putting It All Together

Now we can quantify our uncertainty about the true employment effect.

  1. Center the sampling distribution on \(\bar{x} = 0.59\).
  1. Width comes from \(S / \sqrt{n}\) — the sample standard error.
  1. Shape is the t-distribution with \(n - 1\) degrees of freedom.



The Card and Krueger (\(\bar{x} = 0.59\), \(S \approx 9.8\), \(n = 331\)):

\[95\% \text{ CI}: \quad 0.59 \pm 1.967 \times 0.54 = [-0.47, \; 1.65]\]

Testing the Theory

From estimating a range for \(\mu\) to testing a specific claim about \(\mu\).


  • Economic theory predicts employment should fall (or at least not change).
  • Card and Krueger observed \(\bar{x} = +0.59\). Is that consistent with no effect?
  • Instead of asking where \(\mu\) might be, we test a specific value: \(\mu = 0\).

The Null Hypothesis

We formalize this approach by setting up a “null hypothesis”


Null Hypothesis (\(H_0\)): The minimum wage had no effect on employment

  • \(H_0: \mu = 0\)



Testing Question:

  • How “surprising” is our data if \(H_0\) were true?

NJ Employment Change

Where does our sample fall on the null distribution?

If \(\mu = 0\), the sampling distribution of \(\bar{x}\) is centered on 0.



Questions. Is 0.59 far from 0? How would we know?

Quantifying Surprise: The p-value

What proportion of the null distribution is at least as extreme as our sample?

If \(\mu = 0\), the sampling distribution of \(\bar{x}\) is centered on 0.

The shaded area is the probability of getting a result as extreme as ours, if \(\mu = 0\).


\(p = 0.28\): about 28% of samples would look this extreme or more.

The t-Statistic

How many standard errors is our sample mean from the null?

The t-statistic measures how many standard errors our sample mean is from the null value:

\[t = \frac{\bar{x} - \mu_0}{S/\sqrt{n}}\]

\[t = \frac{0.59 - 0}{9.8/\sqrt{331}} = \frac{0.59}{0.54} = 1.09\]

Where:

  • \(\bar{x}\) is our sample mean (0.59 FTE)
  • \(\mu_0\) is our null value (0)
  • \(S\) is our sample standard deviation (9.8)
  • \(n\) is our sample size (331)

Statistical vs. Practical Significance

A caution about hypothesis testing


Statistical significance:

  • Formal rejection of the null hypothesis (p < 0.05)
  • Only tells us if the effect is unlikely to be exactly zero

Practical significance:

  • Whether the effect size matters in the real world
  • Card and Krueger: the direction contradicted theory, even though NJ alone wasn’t significant

> always consider the magnitude of the effect, not just the p-value

Looking Ahead

The hypothesis testing framework is powerful and general.


  • Part 3.4 | The Simplest Linear Model
  • Part 4 | Bivariate Models

> the tools we built today apply to every model we’ll see in Parts 4 and 5