The ASTA team
company type
, which is called the explanatory variable and divides data in 2 groups.profit ratio
, which is called the response variable.We return to the Chile
data. We study the association between the variables sex
and statusquo
(scale of support for the status-quo). So, we will perform a significance test to test for difference in the mean of statusquo
for male and females.
Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
library(mosaic)
fv <- favstats(statusquo ~ sex, data = Chile)
fv
## sex min Q1 median Q3 max mean sd n missing
## 1 F -1.80 -0.975 0.121 1.033 2.02 0.0657 1.003 1368 11
## 2 M -1.74 -1.032 -0.216 0.861 2.05 -0.0684 0.993 1315 6
1 - pdist("norm", q = 3.4786, xlim = c(-4, 4))
## [1] 0.0002520202
t.test
:t.test(statusquo ~ sex, data = Chile)
##
## Welch Two Sample t-test
##
## data: statusquo by sex
## t = 3.4786, df = 2678.7, p-value = 0.0005121
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.05849179 0.20962982
## sample estimates:
## mean in group F mean in group M
## 0.06570627 -0.06835453
before
and after
, containing the average expedition time before and after installation of the new technology. Instead of doing manual calculations we let R perform the significance test (using t.test
with paired = TRUE
as our samples are paired/dependent):Netto <- read.delim("https://asta.math.aau.dk/datasets?file=Netto.txt")
head(Netto, n = 3)
## before after
## 1 3.730611 3.440214
## 2 2.623338 2.314733
## 3 3.795295 3.586334
t.test(Netto$before, Netto$after, paired = TRUE)
##
## Paired t-test
##
## data: Netto$before and Netto$after
## t = 5.7204, df = 9, p-value = 0.0002868
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1122744 0.2591578
## sample estimates:
## mean of the differences
## 0.1857161
WARNING: The approximation is only good, when \(n_1\hat{\pi},\ n_1(1-\hat{\pi}),\ n_2\hat{\pi},\ n_2(1-\hat{\pi})\) all are greater than 5.
We return to the Chile
dataset. We make a new binary variable indicating whether the person intends to vote no or something else (and we remember to tell R that it should think of this as a grouping variable, i.e. a factor
):
Chile$voteNo <- relevel(factor(Chile$vote == "N"), ref = "TRUE")
We study the association between the variables sex
and voteNo
:
tab <- tally( ~ sex + voteNo, data = Chile, useNA = "no")
tab
## voteNo
## sex TRUE FALSE
## F 363 946
## M 526 697
This gives us all the ingredients needed in the hypothesis test:
Further,
Chile2 <- subset(Chile, !is.na(voteNo))
prop.test(voteNo ~ sex, data = Chile2, correct = FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: tally(voteNo ~ sex)
## X-squared = 64.777, df = 1, p-value = 8.389e-16
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.1896305 -0.1159275
## sample estimates:
## prop 1 prop 2
## 0.2773109 0.4300899
fisher.test(tab)
##
## Fisher's Exact Test for Count Data
##
## data: tab
## p-value = 1.04e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.4292768 0.6021525
## sample estimates:
## odds ratio
## 0.5085996
voteNo
proportions for women and men.