Comparison of two groups

The ASTA team

Response variable and explanatory variable

Dependent/independent samples

Comparison of two means (Independent samples)

Comparison of two means (Independent samples)

Example: Comparing two means (independent samples)

We return to the Chile data. We study the association between the variables sex and statusquo (scale of support for the status-quo). So, we will perform a significance test to test for difference in the mean of statusquo for male and females.

Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
library(mosaic)
fv <- favstats(statusquo ~ sex, data = Chile)
fv
##   sex   min     Q1 median    Q3  max    mean    sd    n missing
## 1   F -1.80 -0.975  0.121 1.033 2.02  0.0657 1.003 1368      11
## 2   M -1.74 -1.032 -0.216 0.861 2.05 -0.0684 0.993 1315       6
1 - pdist("norm", q = 3.4786, xlim = c(-4, 4))

## [1] 0.0002520202
t.test(statusquo ~ sex, data = Chile)
## 
##  Welch Two Sample t-test
## 
## data:  statusquo by sex
## t = 3.4786, df = 2678.7, p-value = 0.0005121
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.05849179 0.20962982
## sample estimates:
## mean in group F mean in group M 
##      0.06570627     -0.06835453

Comparison of two means: confidence interval (independent samples)

Comparison of two means: paired \(t\)-test (dependent samples)

Netto store example

Netto <- read.delim("https://asta.math.aau.dk/datasets?file=Netto.txt")
head(Netto, n = 3)
##     before    after
## 1 3.730611 3.440214
## 2 2.623338 2.314733
## 3 3.795295 3.586334
t.test(Netto$before, Netto$after, paired = TRUE)
## 
##  Paired t-test
## 
## data:  Netto$before and Netto$after
## t = 5.7204, df = 9, p-value = 0.0002868
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1122744 0.2591578
## sample estimates:
## mean of the differences 
##               0.1857161

Comparison of two proportions

Comparison of two proportions

Comparison of two proportions: Independent samples

Approximate test for comparing two proportions (independent samples)

WARNING: The approximation is only good, when \(n_1\hat{\pi},\ n_1(1-\hat{\pi}),\ n_2\hat{\pi},\ n_2(1-\hat{\pi})\) all are greater than 5.

Example: Approximate confidence interval and test for comparing proportions

We return to the Chile dataset. We make a new binary variable indicating whether the person intends to vote no or something else (and we remember to tell R that it should think of this as a grouping variable, i.e. a factor):

Chile$voteNo <- relevel(factor(Chile$vote == "N"), ref = "TRUE")

We study the association between the variables sex and voteNo:

tab <- tally( ~ sex + voteNo, data = Chile, useNA = "no")
tab
##    voteNo
## sex TRUE FALSE
##   F  363   946
##   M  526   697

This gives us all the ingredients needed in the hypothesis test:

Example: Approximate confidence interval (cont.)

Example: \(p\)-value (cont.)

Automatic calculation in R

Chile2 <- subset(Chile, !is.na(voteNo))
prop.test(voteNo ~ sex, data = Chile2, correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  tally(voteNo ~ sex)
## X-squared = 64.777, df = 1, p-value = 8.389e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.1896305 -0.1159275
## sample estimates:
##    prop 1    prop 2 
## 0.2773109 0.4300899

Fisher’s exact test

fisher.test(tab)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  tab
## p-value = 1.04e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.4292768 0.6021525
## sample estimates:
## odds ratio 
##  0.5085996

Agresti: Overview of comparison of two groups