The ASTA team
Consider two populations:
Population 1 has mean μ1 and standard deviation σ1.
Population 2 has mean μ2 and standard deviation σ2.
We want to compare the means by looking at the difference μ1−μ2.
We now take a sample from each population.
The sample from Population 1 has sample mean ˉx1, sample standard deviation s1 and sample size n1.
The sample from Population 2 has sample mean ˉx2, sample standard deviation s2 and sample size n2.
We distinguish between two types of samples:
Example: Suppose we consider the fuel consumption of cars.
If we compare two samples of cars with different engine types, then the two samples are independent, since each car can only have one of the two engine types.
If we compare the fuel consumption of cars at two different speed levels by testing each car at both speed levels, then then samples are paired.
We consider the situation, where we have two independent samples of a quantitative variable.
We estimate the difference μ1−μ2 by d=ˉx1−ˉx2.
Assume that we can find the estimated standard error sed of the difference.
If the samples come from two normal distributions, or if both samples are large (n1,n2≥30), then one can show Tobs=(ˉX1−ˉX2)−(μ1−μ2)sed∼t(df), where t(df) is a t-distribution with df degrees of freedom.
By the usual procedure, we can use this to construct a confidence interval for the unknown population difference of means μ1−μ2 by (ˉx1−ˉx2)±tcritsed, where the critical t-score, tcrit, is determined by the confidence level and the df.
We may be interested the testing the null-hypothesis that the population means are the same, which we can formulated as:
If the null hypothesis is true, then the test statistic: Tobs=(ˉX1−ˉX2)−0sed, has a t-distribution with df degrees of freedom.
The p-value is the probability of observing something further away from 0 than tobs in a t(df) distribution.
It remains to find the estimated standard error sed and the degrees of freedom df. We distinguish between two cases:
√σ21n1+σ22n2.
s2p=(n1−1)s21+(n2−1)s22n1+n2−2.
sed=√s2pn1+s2pn2=sp√1n1+1n2.
We return to the mtcars
data. We study the association between the variables vs
and mpg
(engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.
## vs min Q1 median Q3 max mean sd n missing
## 1 0 10.4 14.8 15.7 19.1 26.0 16.6 3.86 18 0
## 2 1 17.8 21.4 22.8 29.6 33.9 24.6 5.38 14 0
Difference: d=16.6167−(24.5571)=−7.9405.
Sample sizes: n1=18 and n2=14.
Estimated standard deviations: s1=3.8607 (not v-shaped) and s2=5.379 (v-shaped).
Pooled variance: s2p=(n1−1)s21+(n2−1)s22n1+n2−2=17⋅3.86072+13⋅5.379218+14−2=20.984.
Estimated standard error of difference: sed=sp√1n1+1n2=√20.984√118+114=1.6324.
Observed t-score for H0: μ1−μ2=0 is: tobs=d−0sed=−7.94051.6324=−4.864.
The degrees of freedom are df=n1+n2−2=30.
We find the p-value:
## [1] 3.419648e-05
If the variances are unequal, then we simply insert the two estimates s21 and s22 for σ21 and σ22 in the formula for the standard error to obtain the estimated standard error sed=√s21n1+s22n2.
The degrees of freedom df for sed can be estimated by a complicated formula, which we will not present here (see p.365 in the book).
Note:
We return to the mtcars
data. We study the association between the variables vs
and mpg
(engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.
## vs min Q1 median Q3 max mean sd n missing
## 1 0 10.4 14.8 15.7 19.1 26.0 16.6 3.86 18 0
## 2 1 17.8 21.4 22.8 29.6 33.9 24.6 5.38 14 0
Difference: d=16.6167−(24.5571)=−7.9405.
Sample sizes: n1=18 and n2=14.
Estimated standard deviations: s1=3.8607 (not v-shaped) and s2=5.379 (v-shaped).
Estimated standard error of difference: sed=√s21n1+s22n2=√3.8607218+5.379214=1.7014.
Observed t-score for H0: μ1−μ2=0 is: tobs=d−0sed=−7.94051.7014=−4.6671.
The degrees of freedom can be found using R (see below) to be df=22.716.
We find the p-value:
## [1] 0.0001098212
d±tcritsed
## [1] 2.07009
[−7.94−2.07∗1.70;−7.94+2.07∗1.70]=[−11.5,−4.4].
t.test
:##
## Welch Two Sample t-test
##
## data: mpg by vs
## t = -4.6671, df = 22.716, p-value = 0.0001098
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.462508 -4.418445
## sample estimates:
## mean in group 0 mean in group 1
## 16.61667 24.55714
In order to decide whether to use the t-test with equal or unequal variance, we may test the hypothesis H0:σ21=σ22.
As test statistic we use Fobs=s21s22.
If the null-hypothesis is true, we expect Fobs to take values close to 1. Small and large values are critical for H0.
Under H0, Fobs follows a so-called F-distribution with df1=n1−1 and df2=n2−1 degrees of freedom.
mtcars
example, we first compute the sample variances.## 0 1
## 14.90500 28.93341
We compute Fobs=s21s22=14.928.9=0.516.
The probability of observing something smaller than Fobs in an F-distribution with df1=n1−1=17 and df2=n2−1=13:
## [1] 0.1004094
The p-value is 2∗0.1004=0.2008. Here we multiply by two because the test is two-sided (large values would also have been critical).
We find no evidence against the null-hypothesis.
We now consider the case where we have two samples from two different populations but the observations in the two samples are paired.
Example: Suppose we make the following experiment:
So we have 2 samples corresponding to with/without phone. In this case we have paired samples, since we have 2 measurement for each student.
We use the following strategy for analysis:
student
(integer – a simple id)reaction_time
(numeric – average reaction time in milliseconds)phone
(factor – yes
/no
indicating whether speaking on the phone)## student reaction_time phone
## 1 1 604 no
## 2 2 556 no
## 3 3 540 no
yes <- subset(reaction, phone == "yes")
no <- subset(reaction, phone == "no")
reaction_diff <- data.frame(student = no$student, yes = yes$reaction_time, no = no$reaction_time)
reaction_diff$diff <- reaction_diff$yes - reaction_diff$no
head(reaction_diff)
## student yes no diff
## 1 1 636 604 32
## 2 2 623 556 67
## 3 3 615 540 75
## 4 4 672 522 150
## 5 5 601 459 142
## 6 6 600 544 56
##
## One Sample t-test
##
## data: diff
## t = 5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 31.70186 69.54814
## sample estimates:
## mean of x
## 50.625
With a p-value of 0.0000058 we reject the null-hypothesis that speaking on the phone has no influence on the reaction time.
We can avoid the manual calculations and let R perform the significance test by using t.test
with paired = TRUE
:
##
## Paired t-test
##
## data: reaction_time by phone
## t = -5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -69.54814 -31.70186
## sample estimates:
## mean difference
## -50.625
The situation with two populations is an example where we have: * A response variable (or outcome, dependent variable).
We are interested in the effect of the explanatory variable on the response variable.
mtcars
data, mpg
is the response variable and vs
is the explanatory variable.In this lecture we consider the case with one discrete explanatory variable. Module 3 is concerned with the case of one or more continuous variables.
We are now going to consider a situation where we have k populations with mean values μ1,…,μk.
We assume that each population follows a normal distribution and that the standard deviation is the same in all populations.
We are interested in the null-hypothesis that all k populations have the same mean, i.e.
H0:μ1=⋯=μk. Ha: not all μ1,…μk are the same.
We take out a sample from each population.
chickwts
is available in R
, and on the course webpage.weight
: a numeric variable giving the chicken weight.feed
: a factor giving the feed type.We estimate the mean in each group by the sample mean inside that group, i.e. ˆμi=ˉxi, i=1,…,k.
We use mean
to find the mean, for each group:
## casein horsebean linseed meatmeal soybean sunflower
## 323.5833 160.2000 218.7500 276.9091 246.4286 328.9167
feed=casein
but 160.2, when feed=horsebean
.##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
feeds
. R chooses the lexicographically smallest, which is casein
, to be the reference group.Intercept
is the estimated mean ˆμcasein=323.583 in the reference group.
feedhorsebean
estimates the contrast αhorsebean between the casein
and horsebean
group to be ˆαhorsebean=−163.383.
We are now interested in testing the null-hypothesis H0:μ1=μ2=⋯=μkagainstHa: Not all of the population means are the same
Alternatively H0:α2=α3=⋯=αk=0,Ha: At least one contrast is non-zero.
Idea: Compare variation within groups and variation between groups.
We use the test statistic Fobs=(TSS−SSE)/(k−1)SSE/(n−k).
If observations from group i are called xij, j=1,…,k, we have:
Interpretation:
One can show that TSS-SSE measures the variation of group means around common mean.
Thus, Fobs=variation between groupsvariation within groups.
A large variation between groups compared to the variation within groups points against H0.
Thus, large values are critical for the null-hypothesis.
Under the null-hypothesis, Fobs follows an F-distribution with df1=k−1 and df2=n−k degrees of freedom.
A p-value for the null-hypothesis is the probability of observing something larger than Fobs in an F-distribution with df1 and df2 degrees of freedom.
For instance if Fobs=15.36 with df1=5 and df2=65 degrees of freedom:
## [1] 5.967948e-10
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
feed
.Space, Right Arrow or swipe left to move to next slide, click help below for more details