Simple sampling(explained in Agresti section 2.2): Each experimental unit(fex persons) has the same probability of being selected(fex for an interview)
Other strategies for obtaining a random sample from the target population are explained in Agresti section 2.4:
Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
sex
, i.e. the gender distribution in the sample.library(mosaic) tally( ~ sex, data = Chile)
## sex ## F M ## 1379 1321
tally( ~ sex, data = Chile, format = "prop")
## sex ## F M ## 0.5107407 0.4892593
prop.test( ~ sex, data = Chile, correct = FALSE)
## ## 1-sample proportions test without continuity correction ## ## data: Chile$sex [with success = F] ## X-squared = 1.2459, df = 1, p-value = 0.2643 ## alternative hypothesis: true p is not equal to 0.5 ## 95 percent confidence interval: ## 0.4918835 0.5295675 ## sample estimates: ## p ## 0.5107407
correct = FALSE
is needed to make R use the “normal” formulas as on the slides and in the book. When correct = TRUE
(the default) a mathematical correction which you have not learned about is applied and slightly different results are obtained.Compute for the Chile
data set the 99% and 95%-confidence intervals for the probability that a person is female:
qdist("norm", 1 - 0.01/2)
=2.576.qdist("norm", 1 - 0.05/2)
=1.96.prop.test
).The expression of the density function is of slightly complicated form and will not be stated here, instead the \(t\)-distribution is plotted below for \(df =1,2,10\) and \(\infty\).
qdist("t", p = 1 - 0.025, df = 4)
## [1] 2.776445
qdist
with p = 1 - 0.025
since qdist
looks at the area to the left hand side.Ericksen
and want to construct a \(95\%\) confidence interval for the population mean \(\mu\) of the variable crime
.Ericksen <- read.delim("https://asta.math.aau.dk/datasets?file=Ericksen.txt") stats <- favstats( ~ crime, data = Ericksen) stats
## min Q1 median Q3 max mean sd n missing ## 25 48 55 73 143 63.06061 24.89107 66 0
qdist("t", 1 - 0.025, df = 66 - 1, plot = FALSE)
## [1] 1.997138
t.test( ~ crime, data = Ericksen, conf.level = 0.95)
## ## One Sample t-test ## ## data: crime ## t = 20.582, df = 65, p-value < 2.2e-16 ## alternative hypothesis: true mean is not equal to 0 ## 95 percent confidence interval: ## 56.94162 69.17960 ## sample estimates: ## mean of x ## 63.06061
We shall look at a built-in R dataset chickwts
.
?chickwts
yields a page with the following information
An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Their weights in grams after six weeks are given along with feed types.
chickwts
is a data frame with 71 observations on 2 variables:
weight
: a numeric variable giving the chick weight.feed
: a factor giving the feed type.Calculate a confidence interval for the mean weight for each feed separately; the confidence interval is from lower
to upper
given by mean
\(\pm\)tscore * se
:
cwei <- favstats( weight ~ feed, data = chickwts) se <- cwei$sd / sqrt(cwei$n) # Standard errors tscore <- qdist("t", p = .975, df = cwei$n - 1, plot = FALSE) # t-scores for 2.5% right tail probability cwei$lower <- cwei$mean - tscore * se cwei$upper <- cwei$mean + tscore * se cwei[, c("feed", "mean", "lower", "upper")]
## feed mean lower upper ## 1 casein 323.5833 282.6440 364.5226 ## 2 horsebean 160.2000 132.5687 187.8313 ## 3 linseed 218.7500 185.5610 251.9390 ## 4 meatmeal 276.9091 233.3083 320.5099 ## 5 soybean 246.4286 215.1754 277.6818 ## 6 sunflower 328.9167 297.8875 359.9458
gf_errorbarh
:gf_errorbarh(feed ~ lower + upper, data = cwei) %>% gf_point(feed ~ mean)