Estimation

The ASTA team

Point and interval estimates

Point and interval estimates

Point estimators: Bias

Point estimators: Consistency

Point estimators: Efficiency

Notation

Confidence Interval

Confidence interval for proportion

Example: Point and interval estimate for proportion

import pandas as pd

Chile = pd.read_csv("https://asta.math.aau.dk/datasets?file=Chile.txt", sep = "\t")
tab_counts = Chile['sex'].value_counts()
tab_counts
## sex
## F    1379
## M    1321
## Name: count, dtype: int64
tab_prop = Chile['sex'].value_counts(normalize = True)
tab_prop
## sex
## F    0.510741
## M    0.489259
## Name: proportion, dtype: float64

Example: Confidence intervals for proportion in R

from statsmodels.stats.proportion import proportions_ztest, proportion_confint

counts = Chile['sex'].value_counts()
successes = counts["F"]
nobs = counts.sum()

stat, p_value = proportions_ztest(count = successes, nobs = nobs, value = 0.5)

ci_low, ci_high = proportion_confint(successes, nobs, alpha = 0.05, method = 'normal')

print(f"95% CI: ({ci_low:.4f}, {ci_high:.4f})")
## 95% CI: (0.4919, 0.5296)
print(f"sample estimate: {successes/nobs:.4f}")
## sample estimate: 0.5107
print(f"p-value: {p_value:.4g}")
## p-value: 0.2642

General confidence intervals for proportion

Example: Chile data

Compute for the Chile data set the 99% and 95%-confidence intervals for the probability that a person is female:

Confidence Interval for mean - normally distributed sample

\(t\)-distribution and \(t\)-score

The expression of the density function is of slightly complicated form and will not be stated here, instead the \(t\)-distribution is plotted below for \(df =1,2,10\) and \(\infty\).

Calculation of \(t\)-score

from scipy.stats import t
t.ppf(1 - 0.025, df = 4)
## np.float64(2.7764451051977987)

Example: Confidence interval for mean

import numpy as np

Ericksen = pd.read_csv("https://asta.math.aau.dk/datasets?file=Ericksen.txt", sep = "\t")

stats = Ericksen['crime'].agg(
    mean = 'mean',
    std = 'std',
    n = 'count'
)
stats
## mean    63.060606
## std     24.891073
## n       66.000000
## Name: crime, dtype: float64
df = stats["n"] - 1
t_crit = t.ppf(1 - 0.025, df = df)
t_crit
## np.float64(1.9971379083920033)
import pingouin as pg
res = pg.ttest(x = Ericksen['crime'], y = 0, confidence = 0.95)
res["mean"] = Ericksen['crime'].mean()
res[["mean", "T", "dof", "alternative", "p-val", "CI95%"]]
##              mean          T  dof alternative         p-val           CI95%
## T-test  63.060606  20.581949   65   two-sided  3.564824e-30  [56.94, 69.18]

Example: Plotting several confidence intervals

chickwts = pd.read_csv("https://asta.math.aau.dk/datasets?file=chickwts.txt", sep = "\t")
cwei = chickwts.groupby("feed")["weight"].agg(
    ['mean',
     'std',
     'count'
])
cwei["se"] = cwei["std"] / (cwei["count"])**(1/2)
cwei["tscore"] = t.ppf(1 - 0.025, df = cwei["count"] - 1)
cwei["lower"] = cwei["mean"] - cwei["tscore"] * cwei["se"]
cwei["upper"] = cwei["mean"] + cwei["tscore"] * cwei["se"]
cwei[["mean", "lower", "upper"]]
##                  mean       lower       upper
## feed                                         
## casein     323.583333  282.644025  364.522642
## horsebean  160.200000  132.568738  187.831262
## linseed    218.750000  185.561021  251.938979
## meatmeal   276.909091  233.308259  320.509923
## soybean    246.428571  215.175378  277.681765
## sunflower  328.916667  297.887508  359.945825
import matplotlib.pyplot as plt

g = plt.errorbar(
    x = cwei.index,                # feed categories
    y = cwei["mean"],               # means
    yerr = cwei["mean"] - cwei["lower"],  # half-width of CI
    fmt = 'o',                      # point marker
    ecolor = 'black',               # error bar color
    capsize = 5                     # end cap length
)
plt.xlabel("Feed")
plt.ylabel("Weight")
plt.title("Mean weight with 95% CI by feed")

Determining sample size

Sample size for proportion

Example

Sample size for mean