Comparison of two groups

The ASTA team

Response variable and explanatory variable

Dependent/independent samples

Comparison of two means (Independent samples)

Comparison of two means (Independent samples)

Example: Comparing two means (independent samples)

We return to the Chile data. We study the association between the variables sex and statusquo (scale of support for the status-quo). So, we will perform a significance test to test for difference in the mean of statusquo for male and females.

Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
library(mosaic)
fv <- favstats(statusquo ~ sex, data = Chile)
fv
##   sex   min     Q1 median    Q3  max    mean    sd    n missing
## 1   F -1.80 -0.975  0.121 1.033 2.02  0.0657 1.003 1368      11
## 2   M -1.74 -1.032 -0.216 0.861 2.05 -0.0684 0.993 1315       6
1 - pdist("norm", q = 3.4786, xlim = c(-4, 4))

## [1] 0.0002520202
t.test(statusquo ~ sex, data = Chile)
## 
##  Welch Two Sample t-test
## 
## data:  statusquo by sex
## t = 3.4786, df = 2678.7, p-value = 0.0005121
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
##  0.05849179 0.20962982
## sample estimates:
## mean in group F mean in group M 
##      0.06570627     -0.06835453

Comparison of two means: confidence interval (independent samples)

Comparison of two means: paired \(t\)-test (dependent samples)

Reaction time example

reaction <- read.delim("https://asta.math.aau.dk/datasets?file=reaction.txt")
head(reaction, n = 3)
##   student reaction_time phone
## 1       1           604    no
## 2       2           556    no
## 3       3           540    no

Instead of doing manual calculations we let R perform the significance test (using t.test with paired = TRUE as our samples are paired/dependent):

yes <- subset(reaction, phone == "yes")
no  <- subset(reaction, phone == "no")
all(yes$student == no$student)
## [1] TRUE
reaction_paired <- data.frame(student = no$student, yes = yes$reaction_time, no = no$reaction_time)
t.test(reaction_paired$no, reaction_paired$yes, paired = TRUE)
## 
##  Paired t-test
## 
## data:  reaction_paired$no and reaction_paired$yes
## t = -5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -69.54814 -31.70186
## sample estimates:
## mean difference 
##         -50.625
reaction_paired$diff <- reaction_paired$yes - reaction_paired$no
head(reaction_paired)
##   student yes  no diff
## 1       1 636 604   32
## 2       2 623 556   67
## 3       3 615 540   75
## 4       4 672 522  150
## 5       5 601 459  142
## 6       6 600 544   56
t.test( ~ diff, data = reaction_paired)
## 
##  One Sample t-test
## 
## data:  diff
## t = 5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  31.70186 69.54814
## sample estimates:
## mean of x 
##    50.625

Comparison of two proportions

Comparison of two proportions

Comparison of two proportions: Independent samples

Approximate test for comparing two proportions (independent samples)

WARNING: The approximation is only good, when \(n_1\hat{\pi},\ n_1(1-\hat{\pi}),\ n_2\hat{\pi},\ n_2(1-\hat{\pi})\) all are greater than 5.

Example: Approximate confidence interval and test for comparing proportions

We return to the Chile dataset. We make a new binary variable indicating whether the person intends to vote no or something else (and we remember to tell R that it should think of this as a grouping variable, i.e. a factor):

Chile$voteNo <- relevel(factor(Chile$vote == "N"), ref = "TRUE")

We study the association between the variables sex and voteNo:

tab <- tally( ~ sex + voteNo, data = Chile, useNA = "no")
tab
##    voteNo
## sex TRUE FALSE
##   F  363   946
##   M  526   697

This gives us all the ingredients needed in the hypothesis test:

Example: Approximate confidence interval (cont.)

Example: \(p\)-value (cont.)

Automatic calculation in R

Chile2 <- subset(Chile, !is.na(voteNo))
prop.test(voteNo ~ sex, data = Chile2, correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  tally(voteNo ~ sex)
## X-squared = 64.777, df = 1, p-value = 8.389e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.1896305 -0.1159275
## sample estimates:
##    prop 1    prop 2 
## 0.2773109 0.4300899

Fisher’s exact test

fisher.test(tab)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 1.04e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.4292768 0.6021525
## sample estimates:
## odds ratio 
##  0.5085996

Agresti: Overview of comparison of two groups