
The economist’s data analysis skillset.
If all we see is the sample, how do we learn about a population?
If we know the random variable, we can learn many things about the population.

> when we know the probability function, we can calculate everything exactly
If we know the random variable, we can learn many things about the population.
> but what can we know about the population if we only see the sample?
But if all we see is the sample, what can we know about a population?
> how do we learn about \(\mu\) if all we have is \(n\), \(\bar{x}\), and \(S\)?
Lets pretend we don’t know the probability function for dice.
Lets start with something boring.
Your samples have a lot of variability!
> this variability perfectly matches what we would expect from a fair dice
Lets pretend we don’t know the probability function for dice.
Next is something slighly less boring.
Like before, each sample has a slighly different sample mean.
> theres a lot of variability in your sample means!
> what do you expect to see when we plot these sample means (\(\bar{x}\))?
The variability in the sample mean with a larger sample size.
> our sample means are more bunched (like a pyramid) in the middle! why?
> there are more ways to get 7/2 than 2/2!
Lets pretend we don’t know the probability function for dice.
Next is something even less boring.
The variability in the sample mean with a larger sample size.
> theres a even more variability in your sample means!
> what do you expect to see when we plot these sample means (\(\bar{x}\))?
The variability in the sample mean with a larger sample size.
> what do you notice with the shape with n=3?
The variability in the sample mean with a larger sample size.
> what do you notice with the shape with n=3?
The variability in the sample mean with a larger sample size.
> there’s some curvature to the shape
Lets pretend we don’t know the probability function for dice.
Next is something very un-boring.
The variability in the sample mean with a larger sample size.
> theres a even more ways your sample could look!
> what do you expect to see when we plot these sample means (\(\bar{x}\))?
What happens when we really increase the sample size?
> what do you notice with the shape with n=30?
What happens when we really increase the sample size?
> the distribution of sample means gets tighter and more bell-shaped
What happens when we really increase the sample size?
> what is this probability function in red?
If we know the random variable, we can learn many things about the population.

> when we know the probability function, we can calculate everything exactly
If we take multiple samples, we get different sample means.
Each sample gives us a different estimate of the population mean.

If we take multiple samples, their means will vary.

If we take multiple samples, their means will vary, and by much less than the original distribution.

> why? think about rolling two dice… it’s much less likely to get a 2 than a 7
As sample size grows, the distribution of the sample means approaches a normal distribution.

As sample size grows, the normal distribution the sample means approach gets narrower.

The normal distribution the sample means approach is centered on the population mean!

> the sample mean \(\bar{x}\) follows a normal distribution around the truth 😱
\[\bar{x} \sim N\Big(\mu, \frac{\sigma}{\sqrt{n}}\Big)\]
This works for (nearly) any distribution shape as sample size increases.

The distribution of sample means approximates a normal distribution as sample size increases, regardless of the population’s distribution.
