Comparison of two or more groups

The ASTA team

Comparison of two populations

Two populations

Two samples

Dependent and independent samples

Comparison of two means (Independent samples)

Significance test (Independent samples)

Standard error (Independent samples, equal variances)

\[\sqrt{\tfrac{\sigma_1^2}{n_1}+\tfrac{\sigma^2_2}{n_2}}.\]

\[s_p^2=\tfrac{(n_1-1)s^2_1+(n_2-1)s_2^2}{n_1+n_2-2}.\]

\[se_d=\sqrt{\tfrac{s_p^2}{n_1}+\tfrac{s^2_p}{n_2}}=s_p\sqrt{\tfrac{1}{n_1}+\tfrac{1}{n_2}}.\]

Example: Comparing two means (independent samples, equal variances)

We return to the mtcars data. We study the association between the variables vs and mpg (engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.

library(mosaic)
fv <- favstats(mpg ~ vs, data = mtcars)
fv
##   vs  min   Q1 median   Q3  max mean   sd  n missing
## 1  0 10.4 14.8   15.7 19.1 26.0 16.6 3.86 18       0
## 2  1 17.8 21.4   22.8 29.6 33.9 24.6 5.38 14       0
2*pdist("t", q = -4.864, df=30, xlim = c(-5, 5))

## [1] 3.419648e-05

Standard error (Independent samples, unequal variances)

Example: Comparing two means (independent samples, unequal variances)

We return to the mtcars data. We study the association between the variables vs and mpg (engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.

library(mosaic)
fv <- favstats(mpg ~ vs, data = mtcars)
fv
##   vs  min   Q1 median   Q3  max mean   sd  n missing
## 1  0 10.4 14.8   15.7 19.1 26.0 16.6 3.86 18       0
## 2  1 17.8 21.4   22.8 29.6 33.9 24.6 5.38 14       0
2* pdist("t", q = -4.6671, df=22.716, xlim = c(-5, 5))

## [1] 0.0001098212

Example: Comparing two means (independent samples)

\[ d \pm t_{crit} se_d\]

qdist("t", p = 1-0.05/2, df=22.716, xlim = c(-3, 3))

## [1] 2.07009

\[[-7.94 - 2.07*1.70;-7.94 + 2.07*1.70]= [-11.5,-4.4].\]

T-test in R (Independent samples)

t.test(mpg ~ vs, data = mtcars,var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by vs
## t = -4.6671, df = 22.716, p-value = 0.0001098
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.462508  -4.418445
## sample estimates:
## mean in group 0 mean in group 1 
##        16.61667        24.55714

Test for equal variances (Independent samples)

Example: Test for equal variances (Independent samples)

var(mpg~vs,data=mtcars) 
##        0        1 
## 14.90500 28.93341
 pdist("f", 0.516, df1=17, df2=13)

## [1] 0.1004094

Comparison of two means: paired \(t\)-test (dependent samples)

Reaction time: data example

reaction <- read.delim("https://asta.math.aau.dk/datasets?file=reaction.txt")
head(reaction, n = 3)
##   student reaction_time phone
## 1       1           604    no
## 2       2           556    no
## 3       3           540    no
yes <- subset(reaction, phone == "yes")
no  <- subset(reaction, phone == "no")
all(yes$student == no$student)
## [1] TRUE
reaction_paired <- data.frame(student = no$student, yes = yes$reaction_time, no = no$reaction_time)
reaction_paired$diff <- reaction_paired$yes - reaction_paired$no
head(reaction_paired)
##   student yes  no diff
## 1       1 636 604   32
## 2       2 623 556   67
## 3       3 615 540   75
## 4       4 672 522  150
## 5       5 601 459  142
## 6       6 600 544   56
t.test( ~ diff, data = reaction_paired)
## 
##  One Sample t-test
## 
## data:  diff
## t = 5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  31.70186 69.54814
## sample estimates:
## mean of x 
##    50.625
t.test(reaction_paired$no, reaction_paired$yes, paired = TRUE)
## 
##  Paired t-test
## 
## data:  reaction_paired$no and reaction_paired$yes
## t = -5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -69.54814 -31.70186
## sample estimates:
## mean difference 
##         -50.625

Response variable and explanatory variable

More than two groups (Analysis of variance)

More than two populations

Data example

library(mosaic)
gf_boxplot(weight ~ feed, data = chickwts)

Estimation of mean values

mean(weight ~ feed, data = chickwts)
##    casein horsebean   linseed  meatmeal   soybean sunflower 
##  323.5833  160.2000  218.7500  276.9091  246.4286  328.9167

Contrasts

Example: contrast estimates

model <- lm(weight ~ feed, data = chickwts)
summary(model)
## 
## Call:
## lm(formula = weight ~ feed, data = chickwts)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -123.909  -34.413    1.571   38.170  103.091 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    323.583     15.834  20.436  < 2e-16 ***
## feedhorsebean -163.383     23.485  -6.957 2.07e-09 ***
## feedlinseed   -104.833     22.393  -4.682 1.49e-05 ***
## feedmeatmeal   -46.674     22.896  -2.039 0.045567 *  
## feedsoybean    -77.155     21.578  -3.576 0.000665 ***
## feedsunflower    5.333     22.393   0.238 0.812495    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared:  0.5417, Adjusted R-squared:  0.5064 
## F-statistic: 15.36 on 5 and 65 DF,  p-value: 5.936e-10

Overall test for effect

Test statistic

## Warning in geom_point(aes(x = red_dot), color = "red"): All aesthetics have length 1, but the data has 71 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

The \(F\)-test

1 - pdist("f", 15.36, df1=5, df2=65)

## [1] 5.967948e-10

Example

model <- lm(weight ~ feed, data = chickwts)
summary(model)
## 
## Call:
## lm(formula = weight ~ feed, data = chickwts)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -123.909  -34.413    1.571   38.170  103.091 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    323.583     15.834  20.436  < 2e-16 ***
## feedhorsebean -163.383     23.485  -6.957 2.07e-09 ***
## feedlinseed   -104.833     22.393  -4.682 1.49e-05 ***
## feedmeatmeal   -46.674     22.896  -2.039 0.045567 *  
## feedsoybean    -77.155     21.578  -3.576 0.000665 ***
## feedsunflower    5.333     22.393   0.238 0.812495    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared:  0.5417, Adjusted R-squared:  0.5064 
## F-statistic: 15.36 on 5 and 65 DF,  p-value: 5.936e-10