Question at hand: How will Obama's 46% approval rating effect his party's candidate for the 2016 presidential election?
We can fit a line through any cloud of points that we please, but if we just have a sample of data, any trend we detect doesn't necessarily demonstrate that the trend exists in the population at large.
Goal: use statistics calculated from data to makes inferences about the nature of parameters.
In regression,
Classical tools of inference:
Reigning theory: voters will punish candidates from the Presidents party at the ballot box when unemployment is high.
Reigning theory: voters will punish candidates from the Presidents party at the ballot box when unemployment is high.
Some evidence of a negative linear relationship between unemployment level and change in party support - or is there?
\(H_0:\) There is no relationship between unemployment level and change in party support.
\(H_O: \beta_1 = 0\)
If there is no relationship, the pairing between \(X\) and \(Y\) is artificial and we can randomize:
ump_shuffled$unemp <- sample(ump_shuffled$unemp) qplot(x = unemp, y = change, col = party, data = ump_shuffled)
m0 <- lm(change ~ unemp, data = ump) summary(m0)
## ## Call: ## lm(formula = change ~ unemp, data = ump) ## ## Residuals: ## Min 1Q Median 3Q Max ## -14.011 -7.861 -0.183 7.389 16.140 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -6.714 5.457 -1.23 0.23 ## unemp -1.001 0.872 -1.15 0.26 ## ## Residual standard error: 9.11 on 25 degrees of freedom ## Multiple R-squared: 0.0501, Adjusted R-squared: 0.0121 ## F-statistic: 1.32 on 1 and 25 DF, p-value: 0.262
\[ \frac{b - \beta}{SE} \sim t_{df = n - p}\]
t_stat <- (-1.0010 - 0)/0.8717 pt(t_stat, df = 27 - 2) * 2
## [1] 0.262