The ASTA team
Consider two populations:
Population 1 has mean \(\mu_1\) and standard deviation \(\sigma_1\).
Population 2 has mean \(\mu_2\) and standard deviation \(\sigma_2\).
We want to compare the means by looking at the difference \(\mu_1-\mu_2.\)
We now take a sample from each population.
The sample from Population 1 has sample mean \(\bar{x}_1\), sample standard deviation \(s_1\) and sample size \(n_1\).
The sample from Population 2 has sample mean \(\bar{x}_2\), sample standard deviation \(s_2\) and sample size \(n_2\).
We distinguish between two types of samples:
Example: Suppose we consider the fuel consumption of cars.
If we compare two samples of cars with different engine types, then the two samples are independent, since each car can only have one of the two engine types.
If we compare the fuel consumption of cars at two different speed levels by testing each car at both speed levels, then then samples are paired.
We consider the situation, where we have two independent samples of a quantitative variable.
We estimate the difference \(\mu_1-\mu_2\) by \[d=\bar{x}_1-\bar{x}_2.\]
Assume that we can find the estimated standard error \(se_d\) of the difference.
If the samples come from two normal distributions, or if both samples are large (\(n_1,n_2\geq 30\)), then one can show \[T_{obs}=\frac{(\bar{X}_1-\bar{X}_2)-(\mu_1-\mu_2)}{se_d}\sim \texttt{t}(df),\] where \(\texttt{t}(df)\) is a \(t\)-distribution with \(df\) degrees of freedom.
By the usual procedure, we can use this to construct a confidence interval for the unknown population difference of means \(\mu_1-\mu_2\) by \[ (\bar{x}_1-\bar{x}_2)\pm t_{crit}se_d, \] where the critical \(t\)-score, \(t_{crit}\), is determined by the confidence level and the \(df\).
We may be interested the testing the null-hypothesis that the population means are the same, which we can formulated as:
If the null hypothesis is true, then the test statistic: \[T_{obs} = \frac{(\bar{X}_1-\bar{X}_2) - 0}{se_d},\] has a \(t\)-distribution with \(df\) degrees of freedom.
The p-value is the probability of observing something further away from 0 than \(t_{obs}\) in a \(\texttt{t}(df)\) distribution.
It remains to find the estimated standard error \(se_d\) and the degrees of freedom \(df\). We distinguish between two cases:
\[\sqrt{\tfrac{\sigma_1^2}{n_1}+\tfrac{\sigma^2_2}{n_2}}.\]
\[s_p^2=\tfrac{(n_1-1)s^2_1+(n_2-1)s_2^2}{n_1+n_2-2}.\]
\[se_d=\sqrt{\tfrac{s_p^2}{n_1}+\tfrac{s^2_p}{n_2}}=s_p\sqrt{\tfrac{1}{n_1}+\tfrac{1}{n_2}}.\]
We return to the mtcars
data. We study the association between the variables vs
and mpg
(engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.
## vs min Q1 median Q3 max mean sd n missing
## 1 0 10.4 14.8 15.7 19.1 26.0 16.6 3.86 18 0
## 2 1 17.8 21.4 22.8 29.6 33.9 24.6 5.38 14 0
Difference: \(d = 16.6167 - (24.5571) = -7.9405\).
Sample sizes: \(n_1 = 18\) and \(n_2 = 14\).
Estimated standard deviations: \(s_1 = 3.8607\) (not v-shaped) and \(s_2 = 5.379\) (v-shaped).
Pooled variance: \[s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2} = \frac{17\cdot 3.8607^2 + 13 \cdot 5.379 ^2}{18 + 14 -2} = 20.984.\]
Estimated standard error of difference: \(se_d = s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} = \sqrt{20.984}\sqrt{\frac{1}{18} + \frac{1}{14}} = 1.6324\).
Observed \(t\)-score for \(H_0:\ \mu_1-\mu_2=0\) is: \(\quad t_{obs} = \frac{d-0}{se_d} = \frac{-7.9405}{1.6324} = -4.864\).
The degrees of freedom are \(df=n_1 + n_2-2 = 30\).
We find the \(p\)-value:
## [1] 3.419648e-05
If the variances are unequal, then we simply insert the two estimates \(s_1^2\) and \(s_2^2\) for \(\sigma_1^2\) and \(\sigma_2^2\) in the formula for the standard error to obtain the estimated standard error \[ se_d=\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}. \]
The degrees of freedom \(df\) for \(se_d\) can be estimated by a complicated formula, which we will not present here (see p.365 in the book).
Note:
We return to the mtcars
data. We study the association between the variables vs
and mpg
(engine type and fuel consumption). So, we will perform a significance test to test the null-hypothesis that there is no difference between the mean of fuel consumption for the two engine types.
## vs min Q1 median Q3 max mean sd n missing
## 1 0 10.4 14.8 15.7 19.1 26.0 16.6 3.86 18 0
## 2 1 17.8 21.4 22.8 29.6 33.9 24.6 5.38 14 0
Difference: \(d = 16.6167 - (24.5571) = -7.9405\).
Sample sizes: \(n_1 = 18\) and \(n_2 = 14\).
Estimated standard deviations: \(s_1 = 3.8607\) (not v-shaped) and \(s_2 = 5.379\) (v-shaped).
Estimated standard error of difference: \(se_d = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} = \sqrt{\frac{3.8607^2}{18} + \frac{5.379^2}{14}} = 1.7014\).
Observed \(t\)-score for \(H_0:\ \mu_1-\mu_2=0\) is: \(\quad t_{obs} = \frac{d-0}{se_d} = \frac{-7.9405}{1.7014} = -4.6671\).
The degrees of freedom can be found using R (see below) to be \(df=22.716\).
We find the \(p\)-value:
## [1] 0.0001098212
\[ d \pm t_{crit} se_d\]
## [1] 2.07009
\[[-7.94 - 2.07*1.70;-7.94 + 2.07*1.70]= [-11.5,-4.4].\]
t.test
:##
## Welch Two Sample t-test
##
## data: mpg by vs
## t = -4.6671, df = 22.716, p-value = 0.0001098
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.462508 -4.418445
## sample estimates:
## mean in group 0 mean in group 1
## 16.61667 24.55714
In order to decide whether to use the t-test with equal or unequal variance, we may test the hypothesis \(H_0: \sigma_1^2 = \sigma_2^2\).
As test statistic we use \[F_{obs} = \frac{s_1^2}{s_2^2}.\]
If the null-hypothesis is true, we expect \(F_{obs}\) to take values close to 1. Small and large values are critical for \(H_0\).
Under \(H_0\), \(F_{obs}\) follows a so-called \(F\)-distribution with \(df_1=n_1-1\) and \(df_2=n_2-1\) degrees of freedom.
mtcars
example, we first compute the sample variances.## 0 1
## 14.90500 28.93341
We compute \(F_{obs} = \frac{s_1^2}{s_2^2} = \frac{14.9}{28.9} = 0.516\).
The probability of observing something smaller than \(F_{obs}\) in an \(F\)-distribution with \(df_1=n_1-1 = 17\) and \(df_2=n_2-1 = 13\):
## [1] 0.1004094
The p-value is \(2*0.1004= 0.2008\). Here we multiply by two because the test is two-sided (large values would also have been critical).
We find no evidence against the null-hypothesis.
We now consider the case where we have two samples from two different populations but the observations in the two samples are paired.
Example: Suppose we make the following experiment:
So we have 2 samples corresponding to with/without phone. In this case we have paired samples, since we have 2 measurement for each student.
We use the following strategy for analysis:
student
(integer – a simple id)reaction_time
(numeric – average reaction time in milliseconds)phone
(factor – yes
/no
indicating whether speaking on the phone)## student reaction_time phone
## 1 1 604 no
## 2 2 556 no
## 3 3 540 no
yes <- subset(reaction, phone == "yes")
no <- subset(reaction, phone == "no")
all(yes$student == no$student)
## [1] TRUE
reaction_paired <- data.frame(student = no$student, yes = yes$reaction_time, no = no$reaction_time)
reaction_paired$diff <- reaction_paired$yes - reaction_paired$no
head(reaction_paired)
## student yes no diff
## 1 1 636 604 32
## 2 2 623 556 67
## 3 3 615 540 75
## 4 4 672 522 150
## 5 5 601 459 142
## 6 6 600 544 56
##
## One Sample t-test
##
## data: diff
## t = 5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 31.70186 69.54814
## sample estimates:
## mean of x
## 50.625
With a \(p\)-value of 0.0000058 we reject the null-hypothesis that speaking on the phone has no influence on the reaction time.
We can avoid the manual calculations and let R perform the significance test by using t.test
with paired = TRUE
:
##
## Paired t-test
##
## data: reaction_paired$no and reaction_paired$yes
## t = -5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -69.54814 -31.70186
## sample estimates:
## mean difference
## -50.625
The situation with two populations is an example where we have: * A response variable (or outcome, dependent variable).
We are interested in the effect of the explanatory variable on the response variable.
mtcars
data, mpg
is the response variable and vs
is the explanatory variable.In this lecture we consider the case with one discrete explanatory variable. Module 3 is concerned with the case of one or more continuous variables.
We are now going to consider a situation where we have \(k\) populations with mean values \(\mu_1,\ldots,\mu_k\).
We assume that each population follows a normal distribution and that the standard deviation is the same in all populations.
We are interested in the null-hypothesis that all \(k\) populations have the same mean, i.e.
\[H_0: \mu_1 = \dotsm=\mu_k.\] \[H_a: \text{ not all } \mu_1,\ldots \mu_k \text{ are the same}.\]
We take out a sample from each population.
chickwts
is available in R
, and on the course webpage.weight
: a numeric variable giving the chicken weight.feed
: a factor giving the feed type.We estimate the mean in each group by the sample mean inside that group, i.e. \(\hat{\mu}_i = \bar{x}_i\), \(i=1,\ldots, k\).
We use mean
to find the mean, for each group:
## casein horsebean linseed meatmeal soybean sunflower
## 323.5833 160.2000 218.7500 276.9091 246.4286 328.9167
feed=casein
but \(160.2\), when feed=horsebean
.##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
feeds
. R chooses the lexicographically smallest, which is casein
, to be the reference group.Intercept
is the estimated mean \(\hat{\mu}_{casein}=323.583\) in the reference group.
feedhorsebean
estimates the contrast \(\alpha_{horsebean}\) between the casein
and horsebean
group to be \(\hat{\alpha}_{horsebean}=-163.383\).
We are now interested in testing the null-hypothesis \[H_0: \mu_1 = \mu_2 = \dots=\mu_k \quad \mbox{against} \quad H_a: \mbox{ Not all of the population means are the same}\]
Alternatively \[H_0: \alpha_2 = \alpha_3 = \dots=\alpha_k =0, \quad H_a: \mbox{ At least one contrast is non-zero}.\]
Idea: Compare variation within groups and variation between groups.
We use the test statistic \[F_{obs} = \frac{(TSS-SSE)/(k-1)}{SSE/(n-k)}.\]
If observations from group \(i\) are called \(x_{ij}\), \(j=1,\ldots,k\), we have:
Interpretation:
One can show that TSS-SSE measures the variance of group means around common mean.
Interpretation is thus \[F_{obs} = \frac{\text{variance between groups}}{\text{variance within groups}}.\]
## Warning in geom_point(aes(x = red_dot), color = "red"): All aesthetics have length 1, but the data has 71 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
A large variation between groups compared to the variation within groups points against \(H_0\).
Thus, large values are critical for the null-hypothesis.
Under the null-hypothesis, \(F_{obs}\) follows an \(F\)-distribution with \(df_1=k-1\) and \(df_2=n-k\) degrees of freedom.
A \(p\)-value for the null-hypothesis is the probability of observing something larger than \(F_{obs}\) in an \(F\)-distribution with \(df_1\) and \(df_2\) degrees of freedom.
For instance if \(F_{obs}=15.36\) with \(df_1=5\) and \(df_2=65\) degrees of freedom:
## [1] 5.967948e-10
##
## Call:
## lm(formula = weight ~ feed, data = chickwts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -123.909 -34.413 1.571 38.170 103.091
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 323.583 15.834 20.436 < 2e-16 ***
## feedhorsebean -163.383 23.485 -6.957 2.07e-09 ***
## feedlinseed -104.833 22.393 -4.682 1.49e-05 ***
## feedmeatmeal -46.674 22.896 -2.039 0.045567 *
## feedsoybean -77.155 21.578 -3.576 0.000665 ***
## feedsunflower 5.333 22.393 0.238 0.812495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54.85 on 65 degrees of freedom
## Multiple R-squared: 0.5417, Adjusted R-squared: 0.5064
## F-statistic: 15.36 on 5 and 65 DF, p-value: 5.936e-10
feed
.