The ASTA team
company type
, which is called the explanatory variable and divides data in 2 groups.profit ratio
, which is called the response variable.We return to the Chile
data. We study the association between the variables sex
and statusquo
(scale of support for the status-quo). So, we will perform a significance test to test for difference in the mean of statusquo
for male and females.
Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
library(mosaic)
fv <- favstats(statusquo ~ sex, data = Chile)
fv
## sex min Q1 median Q3 max mean sd n missing
## 1 F -1.80 -0.975 0.121 1.033 2.02 0.0657 1.003 1368 11
## 2 M -1.74 -1.032 -0.216 0.861 2.05 -0.0684 0.993 1315 6
## [1] 0.0002520202
t.test
:##
## Welch Two Sample t-test
##
## data: statusquo by sex
## t = 3.4786, df = 2678.7, p-value = 0.0005121
## alternative hypothesis: true difference in means between group F and group M is not equal to 0
## 95 percent confidence interval:
## 0.05849179 0.20962982
## sample estimates:
## mean in group F mean in group M
## 0.06570627 -0.06835453
student
(integer – a simple id)reaction_time
(numeric – average reaction time in milliseconds)phone
(factor – yes
/no
indicating whether speaking on the phone)## student reaction_time phone
## 1 1 604 no
## 2 2 556 no
## 3 3 540 no
Instead of doing manual calculations we let R perform the significance test (using t.test
with paired = TRUE
as our samples are paired/dependent):
yes <- subset(reaction, phone == "yes")
no <- subset(reaction, phone == "no")
all(yes$student == no$student)
## [1] TRUE
reaction_paired <- data.frame(student = no$student, yes = yes$reaction_time, no = no$reaction_time)
t.test(reaction_paired$no, reaction_paired$yes, paired = TRUE)
##
## Paired t-test
##
## data: reaction_paired$no and reaction_paired$yes
## t = -5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -69.54814 -31.70186
## sample estimates:
## mean difference
## -50.625
With a \(p\)-value of 0.0000058 we reject that speaking on the phone has no influence on the reaction time.
To understand what is going on, we can manually find the reaction time difference for each student and do a one sample t-test on this difference:
## student yes no diff
## 1 1 636 604 32
## 2 2 623 556 67
## 3 3 615 540 75
## 4 4 672 522 150
## 5 5 601 459 142
## 6 6 600 544 56
##
## One Sample t-test
##
## data: diff
## t = 5.4563, df = 31, p-value = 5.803e-06
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 31.70186 69.54814
## sample estimates:
## mean of x
## 50.625
WARNING: The approximation is only good, when \(n_1\hat{\pi},\ n_1(1-\hat{\pi}),\ n_2\hat{\pi},\ n_2(1-\hat{\pi})\) all are greater than 5.
We return to the Chile
dataset. We make a new binary variable indicating whether the person intends to vote no or something else (and we remember to tell R that it should think of this as a grouping variable, i.e. a factor
):
We study the association between the variables sex
and voteNo
:
## voteNo
## sex TRUE FALSE
## F 363 946
## M 526 697
This gives us all the ingredients needed in the hypothesis test:
##
## 2-sample test for equality of proportions without continuity correction
##
## data: tally(voteNo ~ sex)
## X-squared = 64.777, df = 1, p-value = 8.389e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1896305 -0.1159275
## sample estimates:
## prop 1 prop 2
## 0.2773109 0.4300899
##
## Fisher's Exact Test for Count Data
##
## data: tab
## p-value = 1.04e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.4292768 0.6021525
## sample estimates:
## odds ratio
## 0.5085996
voteNo
proportions for women and men.