Suppose that \(8\%\) of college students are vegetarians.
False. Here we want to use the criteria that \[ \begin{align} n p &\geq 10 \\ n (1-p) &\geq 10 \end{align} \] With \(n=60\), the first condition is not satisfied. \[ \begin{align} n p &= 60 \times 0.08 \\ &=4.8 \end{align} \]
True. Because the body of the distribution (around \(p\)) is so close to the lower bound, the distribution will be skewed. The figure in exercise 2.11 on page 115 shows an example of such a sampling distribution. As the number of samples in the distribution increases, the spread of the sampling distribution becomes small with respect to the bounds, and is more symetrically distributed around \(p\).
We can construct confidence intervals around our point estimate (\(\hat{p} = 12\%\)). Note, that this situation just meets our criteria discussed in part a (\(125 \times 0.08=10\)). For any arbitrary estimator,\(\hat{\theta}\), the confidence interval is \(\hat{\theta} \pm z^{*} \times SE\), where \(z^{*}\) is our confidence level, and SE is the standard error of the sampling distribution. Since we are working with a single proportion the standard error is \[ SE = \sqrt{\frac{p(1-p)}{n}} \] For \(n=125\) and \(p=0.12\) the standard error is \[ \begin{align} SE &= \sqrt{\frac{p(1-p)}{n}} \\ &= \sqrt{\frac{0.12 \times (1-0.12)}{125}} \\ &= 0.029 \end{align} \]
This gives us \(95\%\) confidence intervals of \[ \begin{align} \hat{p} &\pm z^{*} \times SE \\ \hat{p} &\pm 1.96 \times 0.029 \\ \hat{p} &\pm 0.057~{\rm or} \\ (0&.063, 0.177) \end{align} \] This range overlaps with the null proportion of \(8\%\). Thus, we do not consider out point estimate to be unusual.
Or with a Hypotheis Test \[ \begin{align} H_{0} &= p = 0.08 \\ H_{A} &= p \neq 0.08 \end{align} \] With a z-score \[ \begin{align} z &= \frac{(\hat{p} - 0.08)}{SE} \\ &= \frac{(\hat{p} - 0.08)}{ \sqrt{ \frac{\hat{p} (1-\hat{p})}{n} } } \\ &= \frac{(0.12 - 0.08)}{ \sqrt{ \frac{0.12 (1-0.12)}{125} } } \\ &= 1.38 \end{align} \] With this z-score we should fail to reject the null hypothesis, i.e. this value is not unusual.
Or using the same hypotheis test as part (c), we can calculate a z-score With a z-score \[ \begin{align} z &= \frac{(\hat{p} - 0.08)}{SE} \\ &= \frac{(\hat{p} - 0.08)}{ \sqrt{ \frac{\hat{p} (1-\hat{p})}{n} } } \\ &= \frac{(0.12 - 0.08)}{ \sqrt{ \frac{0.12 (1-0.12)}{255} } } \\ &= 1.38 \end{align} \] This puts the observation in the 0.974th percentile, for a two sided p-value of 0.07. In other words, suggestive, but not convincing that this observation is unusual.
The constant of proportionality that relates the two standard errors is the square root of the ratio of the sample sizes (\(\sqrt{\frac{n_{2}}{n_{1}}}\)), not the ratio of the sample sizes (\(\frac{n_{2}}{n_{1}}\)). In the case of parts (c) \(\&\) (d) the standard error would be reduced by a factor of \(\frac{1}{\sqrt{2}} = 0.707\).
This problem looks at a randomized drug trial for HIV positive women giving birth. The sample size is \(n_{Tot} = 240\) with \(n_{Nev}=120\) and \(n_{Lop}=120\). We also know the the counts of virologic failure were \(n_{Nev, failure} =26\) and \(n_{Lop, failure}=10\).
Drug | Failure | No Failure |
---|---|---|
Nevaripine | 26 | 94 |
Lopinavir | 10 | 120 |
\[ \begin{align} z &= \frac{(\hat{p}_{Nev} - \hat{p}_{Lop}) - (p_{Nev} - p_{Lop})}{SE} \\ &= \frac{(\hat{p}_{Nev} - \hat{p}_{Lop})}{\sqrt{\frac{p_{Nev}(1-p_{Nev})}{n_{Nev}} + \frac{p_{Lop}(1-p_{Lop})}{n_{Lop}}}} \end{align} \]
Now at this stage, \(p_{Nev}\) and \(p_{Lop}\) can be replaced with \(\hat{p}_{Nev}\) and \(\hat{p}_{Lop}\) in the standard error or the pooled standard error can be used. \[ SE = \sqrt{\frac{\hat{p}_{pooled}(1-\hat{p}_{pooled})}{n_{1}+n_{2}}} \\ {\rm with} \\ \hat{p}_{pooled} = \frac{\hat{p}_{1} n_{1} + \hat{p}_{2} n_{2}}{n_{1}+n_{2}} \\ \] If we use the former, \[ \begin{align} z &= \frac{(0.22 - 0.08)}{\sqrt{\frac{0.22(1-0.22)}{120} + \frac{0.08(1-0.08)}{120}}} \\ &= 3.1 \end{align} \] With a z-score of 3.1, there is a probability of \((1-0.999) \times 2 \sim 0.002\) of seeing the difference in proportions that we see if the null hypothesis was true. Therefore, we reject the null hypothesis and conclude that thr proportion of virologic failure does depend on treatment.
The professor expected \(n \times p_{j}\) students in each of the \(j\) categories, with \(n=126\), thus: \[ \begin{align} Expected_{purchase} &= 126 \times 0.60 = 75.6\\ Expected_{print~web} &= 126 \times 0.25 = 31.5 \\ Expected_{read~online} &= 126 \times 0.15 = 18.9 \\ \end{align} \]
The counts in each cell are greater than 5, there are three categories, and independence seems reasonable.
The \(\chi^{2}\) statistic is: \[ \chi^2 = \sum_{j=1}^{k} \frac{(Obs_{j}-Exp_{j})^{2}}{Exp_{j}} \] Which in this case is \[ \begin{align} \chi^2 &= \sum_{j=1}^{k} \frac{(Obs_{j}-Exp_{j})^{2}}{Exp_{j}} \\ &=\frac{(71-75.6)^{2}}{75.6} + \frac{(30-31.5)^{2}}{31.5} + \frac{(25-18.9)^{2}}{18.9} \\ &= 0.28 + 0.07 + 1.97 &= 2.32 \end{align} \] Since there are three categories, there are two degrees of freedom. The p-value associated with this \(\chi^{2}\) value with two degrees of freedom is greater than 0.3.
Based on the above p-value, we fail to reject the null hypothesis. That is, if the null hypothesis were true, we would expect to see the proportions similar to what the professor sees in class.
The approriate test in this case is a \(\chi^{2}\) test for independence.
The overall proportion of women with clinical depression is \(P(Yes~Depression) = \frac{2607}{50739} = 0.051\). Thus, the proportion of women without clinical depression is \(P(No~Depression) = \frac{48132}{50739} = 0.949\)
We know that the expected proportion for this cell is \[ \begin{align} n_{i} \times p_{j} &= \frac{n_{column~j}}{n_{total}} \times n_{row~i} \\ &= \frac{6617}{50739} \times 2607 \\ &= 339.99 \end{align} \] The contribution to the total \(\chi^{2}\) from this cell is \[ \begin{align} z^{2} &= \frac{(Observed - Expected)^{2}}{Expected} \\ &= \frac{(373-399.99)^{2}}{399.99} \\ &= 1.82 \end{align} \]
We can compare the \(\chi^{2}\) = 20.93 to a \(\chi^{2}\) distribution to get the p-value. We have two rows and five columns for four degrees of freedom. \[ \begin{align} df &= (n_{rows} - 1) \times (n_{columns} - 1) \\ &= (2 - 1) \times (5 - 1) \\ &= 4 \end{align} \] To reject the null hypothesis with a significance of \(0.001\) we would need a \(\chi^{2}\) of at least 18.47. It looks like we have that.
We reject the null hypothesis.
Yes, this study tells us that clinical depression is not independent of the amount of coffee. The full nature of this relationship, and therefore health policy recommendations will take further study.