Estimation

The ASTA team

Estimation

Aim of statistics

Random sampling schemes

Simple sampling(explained in Agresti section 2.2): Each experimental unit(fex persons) has the same probability of being selected(fex for an interview)

Other strategies for obtaining a random sample from the target population are explained in Agresti section 2.4:

Point and interval estimates

Point estimators: Bias

Point estimators: Consistency

Point estimators: Efficiency

Notation

Confidence Interval

Confidence interval for proportion

Example: Point and interval estimate for proportion

Chile <- read.delim("https://asta.math.aau.dk/datasets?file=Chile.txt")
library(mosaic)
tally( ~ sex, data = Chile)
## sex
##    F    M 
## 1379 1321
tally( ~ sex, data = Chile, format = "prop")
## sex
##         F         M 
## 0.5107407 0.4892593

Example: Confidence intervals for proportion in R

prop.test( ~ sex, data = Chile, correct = FALSE)
## 
##  1-sample proportions test without continuity correction
## 
## data:  Chile$sex  [with success = F]
## X-squared = 1.2459, df = 1, p-value = 0.2643
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.4918835 0.5295675
## sample estimates:
##         p 
## 0.5107407

General confidence intervals for proportion

Example: Chile data

Compute for the Chile data set the 99% and 95%-confidence intervals for the probability that a person is female:

Confidence Interval for mean - normally distributed sample

\(t\)-distribution and \(t\)-score

The expression of the density function is of slightly complicated form and will not be stated here, instead the \(t\)-distribution is plotted below for \(df =1,2,10\) and \(\infty\).

Calculation of \(t\)-score in R

qdist("t", p = 1 - 0.025, df = 4)

## [1] 2.776445

Example: Confidence interval for mean

Ericksen <- read.delim("https://asta.math.aau.dk/datasets?file=Ericksen.txt")
stats <- favstats( ~ crime, data = Ericksen)
stats
##  min Q1 median Q3 max     mean       sd  n missing
##   25 48     55 73 143 63.06061 24.89107 66       0
qdist("t", 1 - 0.025, df = 66 - 1, plot = FALSE)
## [1] 1.997138
t.test( ~ crime, data = Ericksen, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  crime
## t = 20.582, df = 65, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  56.94162 69.17960
## sample estimates:
## mean of x 
##  63.06061

Example: Plotting several confidence intervals in R

cwei <- favstats( weight ~ feed, data = chickwts)
se <- cwei$sd / sqrt(cwei$n) # Standard errors
tscore <- qt(p = .975, df = cwei$n - 1) # t-scores for 2.5% right tail probability
cwei$lower <- cwei$mean - tscore * se
cwei$upper <- cwei$mean + tscore * se
cwei[, c("feed", "mean", "lower", "upper")]
##        feed     mean    lower    upper
## 1    casein 323.5833 282.6440 364.5226
## 2 horsebean 160.2000 132.5687 187.8313
## 3   linseed 218.7500 185.5610 251.9390
## 4  meatmeal 276.9091 233.3083 320.5099
## 5   soybean 246.4286 215.1754 277.6818
## 6 sunflower 328.9167 297.8875 359.9458
gf_errorbar(feed ~ lower + upper, data = cwei) %>% 
  gf_point(feed ~ mean)

Determining sample size

Sample size for proportion

Example

Sample size for mean