We use \(s_x, s_y, \textrm{ and } R\) to calculate \(b_1\).
We use \(s_x, s_y, \textrm{ and } R\) to calculate \(b_1\).
If the line of best fit must pass through \((\bar{x}, \bar{y})\), what is \(b_0\)?
Since \((11.35, 86.01)\) is on the line, the following relationship holds.
\[ 86.01 = b_0 - 0.9 (11.35) \]
Then just solve for \(b_0\).
\[ b_0 = 86.01 + 0.9 (11.35) = 96.22\]
More generally:
\[ b_0 = \bar{y} - b_1 \bar{x} \]
m1 <- lm(Graduates ~ Poverty, data = poverty) summary(m1)
## ## Call: ## lm(formula = Graduates ~ Poverty, data = poverty) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.954 -1.820 0.544 1.515 6.199 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 96.202 1.343 71.65 < 2e-16 *** ## Poverty -0.898 0.114 -7.86 3.1e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.5 on 49 degrees of freedom ## Multiple R-squared: 0.558, Adjusted R-squared: 0.549 ## F-statistic: 61.8 on 1 and 49 DF, p-value: 3.11e-10
lm
objectattributes(m1)
## $names ## [1] "coefficients" "residuals" "effects" "rank" ## [5] "fitted.values" "assign" "qr" "df.residual" ## [9] "xlevels" "call" "terms" "model" ## ## $class ## [1] "lm"
m1$coef
## (Intercept) Poverty ## 96.202 -0.898
m1$fit
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ## 83.1 88.7 84.3 80.0 84.7 87.8 89.2 88.9 81.1 85.3 85.3 86.7 85.6 86.1 88.4 ## 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ## 88.7 87.8 84.4 80.9 86.1 89.6 87.6 87.0 90.4 80.4 87.6 83.9 87.7 88.7 91.2 ## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 ## 89.2 80.2 83.6 84.4 85.5 87.1 83.0 86.1 87.9 87.0 84.1 87.0 83.5 82.5 87.9 ## 46 47 48 49 50 51 ## 87.3 88.4 86.5 81.8 88.5 87.7
The slope describes the estimated difference in the \(y\) variable if the explanatory variable \(x\) for a case happened to be one unit larger.
m1$coef[2]
## Poverty ## -0.898
For each additional percentage point of people living below the poverty level, we expect a state to have a proprotion of high school graduates that is 0.898 lower.
Be Cautious: if it is observational data, you do not have evidence of a causal link, but of an association, which still can be used for prediction.
The intercept is the estimated \(y\) value that will be taken by a case with an \(x\) value of zero.
m1$coef[1]
## (Intercept) ## 96.2
While necessary for prediction, the intercept often has no meaningful interpretation.
m1 <- lm(runs ~ at_bats, data = mlb11) ggplot(m1, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept = 0, linetype = "dashed") + xlab("Fitted values") + ylab("Residuals")
ggplot(m1, aes(x = .resid)) + geom_histogram(binwidth = 25) + xlab("Residuals") ggplot(m1, aes(sample = .resid)) + geom_point(stat = "qq")