Problem Set 6

Exercise 2.10

The question asks for hypotheses to test if the clinic ART success rate is significantly higher than the rate reported by the CDC. Here the hypothese are \[ H_{0}: p = 0.31 \\ H_{A}: p > 0.31 \] I have heavily stressed two-tailed tests, so the following is also an acceptable alternative hypothesis: \[ H_{A}: p \neq 0.31 \]
We can simulate this in the following manner:
1. Take a stack of cards, say 100, and mark 31 of them as “success”
2. Shuffle stack of cards, draw one and record result. Then put the card back in the deck and shuffle again.
3. Repeat step two for a total of 30 draws. After 30 draws, record the proportion of successes.
4. RepeatSteps two and three many times to build the null distribution

The p-value can then be calculated by finding the proportion of simulation results that are equal to or more extreme than the test statistic ($\hat{p}=0.4$).

By comparing the test statistic with a visualization of the null distribution we can estimate the p-value. In this case the p-value is quite high, maybe 0.3 or so, and we fail to reject the null hypothesis.
Kind of harsh. That said, the result does not support the claim made by the clinic.

Exercise 2.13

Each point is a different point estimate ($\hat{p}$). We can see from the figure, as $n$ increases, the spread of the distribution of sample proportions gets narrower, i.e. its variance is getting smalller.

Exercise 2.16

$Z > -1.13$

The proportion of a standard normal curve in this region is \[ \begin{align} P(Z > -1.13) &= 1 - P(Z \leq -1.13) \\ &= 1 - 0.13 \\ &= 0.87 \end{align} \]

1-pnorm(-1.13)

## [1] 0.8707619

p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-1.13,10),geom = "area")
p1

$Z < 0.18$

\[ \begin{align} P(Z < 0.18) &= 0.57 \\ \end{align} \]

pnorm(0.18)

## [1] 0.5714237

p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-10,0.18),geom = "area")
p1

$Z > 8$

\[ \begin{align} P(Z > 8) &= 1 - P(Z \leq 8) \\ &= 6.67\times10^{-16} \\ &\sim 0 \end{align} \]

1-pnorm(8)

## [1] 6.661338e-16

p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(8,10),geom = "area")
p1

$|Z| < 0.5$

\[ \begin{align} P(|Z| < 0.5) &= P(Z \leq 0.5) - P(Z \leq -0.5) \\ &= 0.69 - 0.31 \\ &= 0.38. \end{align} \]

pnorm(-0.5)

## [1] 0.3085375

pnorm(0.5)

## [1] 0.6914625

p1 <- ggplot(data = data.frame(x = c(-3, 3)), aes(x)) +
  stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1)) + ylab("") +
  scale_y_continuous(breaks = NULL) + xlim(c(-4,4)) + stat_function(fun = dnorm, xlim = c(-0.5,0.5),geom = "area")
p1

Exercise 2.18

The two distributions can be denoted

\[ \begin{align} X_{men, 30-34} &\sim N(\mu=4313,\sigma=583) \\ X_{women, 25-29} &\sim N(\mu=5261,\sigma=807) \end{align} \]

The Z-scores for Leo’s and Mary’s finishing times are

\[ \begin{align} Z_{Leo} &= \frac{x-\mu}{\sigma} \\ &= \frac{4948-4313}{583} \\ &= 1.09 \\ Z_{Mary} &= \frac{x-\mu}{\sigma} \\ &= \frac{5513-5261}{807} \\ &= 0.31 \end{align} \]

(4948-4313)/583

## [1] 1.089194

(5513-5261)/807

## [1] 0.3122677

These Z-scores tell us where in a standard normal distribution these times would be located.

It looks like Mary did better than Leo with respect to their age/gender grouping. Leo’s higher Z-score tell us that a larger percent of his group finished fater than him than the percent that finished faster than Mary in her group.
Leo finished faster than $1-P(x\leq1.09)=0.14\%$ of his group.
Mary finished faster than $1-P(x\leq0.31)=0.38\%$ of her group.

pnorm(1.09)

## [1] 0.8621434

pnorm(0.31)

## [1] 0.6217195

If these distributions are not nearly normal then Z-scores would not be appropriate. Although we could calculate a z-score in the same way as part (b), our interepretations of this z-score would be incorrect.

Exercise 2.20

We want the fastest $5\%$ of the men’s group from the last problem. Recall,

\[ x_{men, 30-34} \sim N(\mu=4313,\sigma=583) \] The fastest $5\%$ score that corresponds to this precentile, and then convert to a finishing time:

\[ x_{5\%} = \sigma \times Z + \mu \]

In R

First, determine the appropriate z-score from the percentile.

z<-qnorm(0.05)
z

## [1] -1.644854

Now, convert this z-score to a finishing time.

mu<-4313
sigma<-583

x<- sigma*z+mu
x

## [1] 3354.05

x/60.

## [1] 55.90084

The cutoff time is 3354.05 seconds or $\sim$ 56 minutes.

Now for the fastest $10\%$ of the women’s group.

z<-qnorm(0.10)
z

## [1] -1.281552

mu<-5261
sigma<-807

x<- sigma*z+mu
x

## [1] 4226.788

x/60.

## [1] 70.44646

The cutoff time for the top $10\%$ in the women’s group is 3933.6 seconds or $\sim$ 66 minutes.

Exercise 2.24

\[ x_{heights 10 yrs} \sim N(\mu=55,\sigma=6) \] a. We can find $P(X < 48)$ by finding the percentile associated with a z-score for $x=48$.

\[ \begin{align} Z_{48} &= \frac{48-55}{6} \\ &= -1.17 \end{align} \]

The percentile for this z-score is:

pnorm(-1.17)

## [1] 0.1210005

We see that $P(X < 48) = 12\%$.

$P(60 < X < 65)$

$$ \[\begin{align} Z_{60} &= \frac{60-55}{6} \\ &= 0.83\\ \\ Z_{65} &= \frac{65-55}{6} \\ &= 1.67\\ \end{align}\]

$$ Now we find the difference in thier percentiles to get the probability.

pnorm(1.67)-pnorm(0.83)

## [1] 0.1558097

And we see that $P(60 < X < 65) = 16\%$

Very tall is:

mu<-55
sigma<-6

qnorm(0.9)*sigma + mu

## [1] 62.68931

Thus, anyone over $\sim 62.7$ inches is “very tall”.

$P(X < 54)$

\[ \begin{align} Z_{54} &= \frac{54-55}{6} \\ &= -0.17\\ \end{align} \] With a corresponding probability

pnorm(-0.17)

## [1] 0.4325051

So $P(X < 54) = 43\%$. That’s a large portion of the 10 year old population that is unable to experience Batman the Ride. Maybe next year.