
The economist’s data analysis workflow.
We often want to model relationships through time.

Data related in time has a special problem:
We can check whether values in timeseries are related to their own past values.
A Lag Plot shows the value today (\(t\)) against the value yesterday (\(t-1\)).

> autocorrelation tells us today’s value depends on yesterday’s value
The standard approach has problems with time series.
\[\text{Y} = \beta_0 + \beta_1 \cdot \text{t} + \varepsilon\]

The standard approach has problems with time series.
\[\text{Y} = \beta_0 + \beta_1 \cdot \text{t} + \varepsilon\]

> you can see it will often be wrong in the same direction repeatedly
GLM’s confidence levels requres that the error terms are independent.
\[\text{Y} = \beta_0 + \beta_1 \cdot \text{t} + \varepsilon\]

> this ‘levels’ model shows strong patterns in residuals (autocorrelation)
Examine a levels model of the relationship between GDP and unemployment.
\[\text{Y} = \beta_0 + \beta_1 \times \text{t} + \varepsilon\]
We can fix some issues of autocorrelation by looking at changes instead of levels.
\[\Delta \text{Y}_t = \text{Y}_t - \text{Y}_{t-1}\]

> differences (correctly in this case) shows no relationship
> what would first differences look like if there WAS a positive trend?
What would first differences look like if there WAS a positive trend?

What would first differences look like if there WAS a positive trend?

> the vertical intercept \(\beta_0\) is positive!
> the slope coefficient \(\beta_1\) is zero!
First differences reduces but does not eliminate the problem of autocorrelation.

Examine a first difference model of the relationship between GDP and unemployment.
Sometimes we want to measure how two variables move together.
\[\Delta \text{Y}_t = \beta_0 + \beta_1 \times \Delta \text{X}_t + \varepsilon_t\]

Relating changes in X to changes in Y.
\[\Delta \text{Y}_t = \beta_0 + \beta_1 \times \Delta \text{X}_t + \varepsilon_t\]
Examine a double first difference model of the relationship between GDP and unemployment.
# Step 1. Create first differences variables
data['gdp_diff'] = df['gdp'].diff()
df['unemployment_diff'] = df['unemployment'].diff()# Fit the differences model
model3 = smf.ols('gdp_diff ~ unemployment_diff', data=data).fit()
print(model3.summary())> \(\beta_1\) now represents the short-term relationship between changes in X and Y
Proportional changes provide interpretable coefficients:
\[g_Y = \frac{\text{Y}_t - \text{Y}_{t-1}}{\text{Y}_{t-1}} = \frac{\Delta \text{Y}_t}{\text{Y}_{t-1}}\]

Proportional changes provide interpretable coefficients:
\[g_Y = \frac{\text{Y}_t - \text{Y}_{t-1}}{\text{Y}_{t-1}} = \frac{\Delta \text{Y}_t}{\text{Y}_{t-1}}\]

Proportional changes provide interpretable coefficients:
\[g_Y = \frac{\text{Y}_t - \text{Y}_{t-1}}{\text{Y}_{t-1}} = \frac{\Delta \text{Y}_t}{\text{Y}_{t-1}}\]

Is the growth in Y related to the growth in X?
\[g_Y = \beta_0 + \beta_1 \times g_X + \varepsilon_t\]

Is the growth in Y related to the growth in X?
\[g_Y = \beta_0 + \beta_1 \times g_X + \varepsilon_t\]
Growth rate models have the advantages of first differences and can scale better.
Examine a growth rates model of the relationship between GDP and unemployment.
# Step 1. Calculate growth rates (percentage changes)
data['gdp_growth'] = data['gdp'].pct_change() # in percentage points
data['unemployment_growth'] = data['unemployment'].pct_change()# Step 3. Fit the growth rate model
model4 = smf.ols('gdp_growth ~ unemployment_growth', data=data).fit()
print(model4.summary())> \(\beta_1\) is now expressed in percentage point terms