Group A mean: 7.14 hours
Group B mean: 6.98 hours
The economist’s data analysis skillset.
What can we infer about those not in our data?
Which group sleeps longer?
> everyone in Group A sleeps longer than anyone in Group B
Which group sleeps longer?
> these distributions overlap… lets compare them more precisely
Where is the “center” of each group?
Mean: The average value \[\bar{x} = \frac{x_1 + x_2 + ... x_N}{N}\]
Where is the “center” of each group?
Mean: The average value \[\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i\]
Group A mean: 7.14 hours
Group B mean: 6.98 hours
Which group sleeps longer?
Group A mean: 7.14 hours
Group B mean: 6.98 hours
> group A sleeps longer on average
> but some in Group B sleep longer than most in Group A!
How spread out is the data?
Range: difference between the largest and smallest value in the data
How spread out is the data?
Mean Deviation: difference between each value and the average
\[ \sum \frac{x_i - \bar{x}}{n}\]
How spread out is the data?
Mean Absolute Deviation: absolute value of the difference from the average
\[ \sum \frac{|x_i - \bar{x}|}{n}\]
How spread out is the data?
Variance: average squared difference from the mean
\[ Var_X = \sum \frac{(x_i - \bar{x})^2}{n}\]
How spread out is the data?
Standard Deviation: A measure of spread \[S_X = \sqrt{\sum \frac{(x_i - \bar{x})^2}{n}}\]
How spread out is the data?
Standard Deviation: A measure of spread \[S_X = \sqrt{\sum \frac{(x_i - \bar{x})^2}{n}}\]
Group A std dev: 1.50 hours
Group B std dev: 0.78 hours
> Group A has more variability - some sleep much less, some much more
Both groups are 50 people selected from two different counties.
Old question: “Which group sleeps longer?” (about the data)
New question: “Which county sleeps longer?” (about the population)
The data is a sample drawn from a population.
We observe samples but want to understand populations.
What is data? A sample.
Random Variable: a random process about a population
Probability (Mass/Density) Function: a function that assigns probabilities to each possible outcome
Observation: a realization of a random variable . . .
Sample: a collection of observations
A random variable generates our data.
Random Variable: a random process about a population
Probability Function: a function that assigns probabilities to each possibility

> data is a sample drawn from a random variable
Random variables can have many kinds of probability functions.
We can answer many kinds of probability questions when we know the distribution.
County A’s probability function:
\[x_i \sim N(μ=7.2, σ=1.5)\]

We can answer many kinds of probability questions when we know the distribution.
County A’s probability function:
\[x_i \sim N(μ=7.2, σ=1.5)\]

We can answer many kinds of probability questions when we know the distribution.
County A’s probability function:
\[x_i \sim N(μ=7.2, σ=1.5)\]

What can we say about an unknown population if all see see is the sample?
What we observe:
What we want to know:
What can we say about an unknown population if all see see is the sample?
The sample statistics (\(\bar{x}, S\)) are not the population parameters (\(\mu, \sigma\)).
\[\bar{x} \neq \mu\] \[s \neq \sigma\]
What can we say about an unknown population if all see see is the sample?
> we can answer questions about an unknown population using just a sample