students <- 1:28
samp_students <- sample(students,5) print(samp_students)
## [1] 13 16 24 1 26
students <- 1:28
samp_students <- sample(students,5) print(samp_students)
## [1] 13 16 24 1 26
Gender | promote | nopromote |
---|---|---|
Male | 21 | 3 |
Female | 14 | 10 |
\[ P(promote\,|\,M) = 21/24 = 0.875 \\ P(promote\,|\,F) = 14/24 = 0.583 \]
At a first glance, does there appear to be a relatonship between promotion and gender?
We saw a difference of almost 30% (29.2% to be exact) between the proportion of male and female files that are promoted. Based on this information, which of the below is true?
\(H_0\), Null Hypothesis: "There is nothing going on".
\(H_A\), Alternative Hypothesis: There is something going on.”
\(H_{0}\) : Defendant is not guilty vs. \(H_{A}\) : Defendant is guilty
The hypothesis test gives us:
\[ P(\textrm{data}\,|\,\textrm{H}_0) \]
It doesn't give us:
\[ P(\textrm{H}_0\,|\,\textrm{data}) \]
What is the null hypothesis?
What is the alternative hypothesis?
What is our test statistic?
Gender | promoted | not promoted |
---|---|---|
Male | 21 | 3 |
Female | 14 | 10 |
We can compute our observed test statistic:
\[ d_{obs} = \hat{p}_{M} - \hat{p}_{F} \\ d_{obs} = 21/24 - 14/24 = .29 \]
Face cards: not promoted
Number cards: promoted
\[ d = \hat{p}_{M} - \hat{p}_{F} \]
Repeat steps 1-3 and store each one.
NN=10000 gender <- rep(c("M", "F"), each = 24) promote <- rep(c("Yes", "No"), c(35, 13)) d <- rep(NA, NN) for(i in 1:NN) { newgen <- sample(gender) tab <- table(newgen, promote) d[i] <- diff(tab[, 2]/24) } obs_d_hat <- (21/24) - (14/24) as.data.frame(d) %>% ggplot(aes(x=d)) + geom_histogram(binwidth=0.05) + geom_vline(xintercept=obs_d_hat,color='blue')
Do the results of the simulation you just ran provide convincing evidence of gender discrimination against women, i.e. dependence between gender and promotion decisions?
head(d)
## [1] 0.042 0.125 -0.375 -0.042 -0.042 0.042
head(d > .29)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
mean(d > .29) * 2
## [1] 0.052