The question asks for hypotheses to test if the clinic ART success rate is significantly higher than the rate reported by the CDC. Here the hypothese are \[ H_{0}: p = 0.31 \\ H_{A}: p > 0.31 \] I have heavily stressed two-tailed tests, so the following is also an acceptable alternative hypothesis: \[ H_{A}: p \neq 0.31 \]
The p-value can then be calculated by finding the proportion of simulation results that are equal to or more extreme than the test statistic (\(\hat{p}=0.4\)).
By comparing the test statistic with a visualization of the null distribution we can estimate the p-value. In this case the p-value is quite high, maybe 0.3 or so, and we fail to reject the null hypothesis.
Kind of harsh. That said, the result does not support the claim made by the clinic.
Each point is a different point estimate (\(\hat{p}\)). We can see from the figure, as \(n\) increases, the spread of the distribution of sample proportions gets narrower, i.e. its variance is getting smalller.
The proportion of a standard normal curve in this region is \[ \begin{align} P(Z > -1.13) &= 1 - P(Z \leq -1.13) \\ &= 1 - 0.13 \\ &= 0.87 \end{align} \]
1-pnorm(-1.13)
## [1] 0.8707619
p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-1.13,10),geom = "area")
p1
\[ \begin{align} P(Z < 0.18) &= 0.57 \\ \end{align} \]
pnorm(0.18)
## [1] 0.5714237
p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-10,0.18),geom = "area")
p1
\[ \begin{align} P(Z > 8) &= 1 - P(Z \leq 8) \\ &= 6.67\times10^{-16} \\ &\sim 0 \end{align} \]
1-pnorm(8)
## [1] 6.661338e-16
p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(8,10),geom = "area")
p1
\[ \begin{align} P(|Z| < 0.5) &= P(Z \leq 0.5) - P(Z \leq -0.5) \\ &= 0.69 - 0.31 \\ &= 0.38. \end{align} \]
pnorm(-0.5)
## [1] 0.3085375
pnorm(0.5)
## [1] 0.6914625
p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-0.5,0.5),geom = "area")
p1
\[ \begin{align} X_{men, 30-34} &\sim N(\mu=4313,\sigma=583) \\ X_{women, 25-29} &\sim N(\mu=5261,\sigma=807) \end{align} \]
\[ \begin{align} Z_{Leo} &= \frac{x-\mu}{\sigma} \\ &= \frac{4948-4313}{583} \\ &= 1.09 \\ Z_{Mary} &= \frac{x-\mu}{\sigma} \\ &= \frac{5513-5261}{807} \\ &= 0.31 \end{align} \]
(4948-4313)/583
## [1] 1.089194
(5513-5261)/807
## [1] 0.3122677
These Z-scores tell us where in a standard normal distribution these times would be located.
It looks like Mary did better than Leo with respect to their age/gender grouping. Leo’s higher Z-score tell us that a larger percent of his group finished fater than him than the percent that finished faster than Mary in her group.
Leo finished faster than \(1-P(x\leq1.09)=0.14\%\) of his group.
Mary finished faster than \(1-P(x\leq0.31)=0.38\%\) of her group.
pnorm(1.09)
## [1] 0.8621434
pnorm(0.31)
## [1] 0.6217195
\[ x_{men, 30-34} \sim N(\mu=4313,\sigma=583) \] The fastest \(5\%\) score that corresponds to this precentile, and then convert to a finishing time:
\[ x_{5\%} = \sigma \times Z + \mu \]
In R
First, determine the appropriate z-score from the percentile.
z<-qnorm(0.05)
z
## [1] -1.644854
Now, convert this z-score to a finishing time.
mu<-4313
sigma<-583
x<- sigma*z+mu
x
## [1] 3354.05
x/60.
## [1] 55.90084
The cutoff time is 3354.05 seconds or \(\sim\) 56 minutes.
z<-qnorm(0.10)
z
## [1] -1.281552
mu<-5261
sigma<-807
x<- sigma*z+mu
x
## [1] 4226.788
x/60.
## [1] 70.44646
The cutoff time for the top \(10\%\) in the women’s group is 3933.6 seconds or \(\sim\) 66 minutes.
\[ x_{heights 10 yrs} \sim N(\mu=55,\sigma=6) \] a. We can find \(P(X < 48)\) by finding the percentile associated with a z-score for \(x=48\).
\[ \begin{align} Z_{48} &= \frac{48-55}{6} \\ &= -1.17 \end{align} \]
The percentile for this z-score is:
pnorm(-1.17)
## [1] 0.1210005
We see that \(P(X < 48) = 12\%\).
$$ Now we find the difference in thier percentiles to get the probability.
pnorm(1.67)-pnorm(0.83)
## [1] 0.1558097
And we see that \(P(60 < X < 65) = 16\%\)
mu<-55
sigma<-6
qnorm(0.9)*sigma + mu
## [1] 62.68931
Thus, anyone over \(\sim 62.7\) inches is “very tall”.
\[ \begin{align} Z_{54} &= \frac{54-55}{6} \\ &= -0.17\\ \end{align} \] With a corresponding probability
pnorm(-0.17)
## [1] 0.4325051
So \(P(X < 54) = 43\%\). That’s a large portion of the 10 year old population that is unable to experience Batman the Ride. Maybe next year.