Analysis of Variance

The ASTA team

One way analysis of variance

Example

import pandas as pd

chickwts = pd.read_csv("https://asta.math.aau.dk/datasets?file=chickwts.txt", sep='\t')
chickwts.head(3)
##    weight       feed
## 0     179  horsebean
## 1     160  horsebean
## 2     136  horsebean
import seaborn as sns
import matplotlib.pyplot as plt

p = sns.boxplot(x='feed', y='weight', data=chickwts)

The ANOVA Model

Estimation of mean values

Estimates

chickwts.groupby('feed')['weight'].mean()
## feed
## casein       323.583333
## horsebean    160.200000
## linseed      218.750000
## meatmeal     276.909091
## soybean      246.428571
## sunflower    328.916667
## Name: weight, dtype: float64

Contrast coding

Example

import statsmodels.formula.api as smf

model = smf.ols('weight ~ feed', data=chickwts).fit()
model.summary(slim = True)
OLS Regression Results
Dep. Variable: weight R-squared: 0.542
Model: OLS Adj. R-squared: 0.506
No. Observations: 71 F-statistic: 15.36
Covariance Type: nonrobust Prob (F-statistic): 5.94e-10
coef std err t P>|t| [0.025 0.975]
Intercept 323.5833 15.834 20.436 0.000 291.961 355.206
feed[T.horsebean] -163.3833 23.485 -6.957 0.000 -210.287 -116.480
feed[T.linseed] -104.8333 22.393 -4.682 0.000 -149.554 -60.112
feed[T.meatmeal] -46.6742 22.896 -2.039 0.046 -92.400 -0.948
feed[T.soybean] -77.1548 21.578 -3.576 0.001 -120.249 -34.061
feed[T.sunflower] 5.3333 22.393 0.238 0.812 -39.388 50.054


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Overall test for effect

Graphical representation of models

Hypotheses and test statistic

Interpretation of \(F\) statistic - Variance between/within groups

Example

import statsmodels.formula.api as smf

model = smf.ols('weight ~ feed', data=chickwts).fit() # same as earlier
model.summary(slim = True)
OLS Regression Results
Dep. Variable: weight R-squared: 0.542
Model: OLS Adj. R-squared: 0.506
No. Observations: 71 F-statistic: 15.36
Covariance Type: nonrobust Prob (F-statistic): 5.94e-10
coef std err t P>|t| [0.025 0.975]
Intercept 323.5833 15.834 20.436 0.000 291.961 355.206
feed[T.horsebean] -163.3833 23.485 -6.957 0.000 -210.287 -116.480
feed[T.linseed] -104.8333 22.393 -4.682 0.000 -149.554 -60.112
feed[T.meatmeal] -46.6742 22.896 -2.039 0.046 -92.400 -0.948
feed[T.soybean] -77.1548 21.578 -3.576 0.001 -120.249 -34.061
feed[T.sunflower] 5.3333 22.393 0.238 0.812 -39.388 50.054


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Two way analysis of variance

Additive effects

ToothGrowth = pd.read_csv("https://asta.math.aau.dk/datasets?file=ToothGrowth.txt", sep='\t')
ToothGrowth['dose'] = pd.Categorical(
    ToothGrowth['dose'].map({0.5: 'LO', 1: 'ME', 2: 'HI'}),
    categories=['LO', 'ME', 'HI'],
    ordered=True
)
ToothGrowth.head(3)
##     len supp dose
## 0   4.2   VC   LO
## 1  11.5   VC   LO
## 2   7.3   VC   LO

Dummy coding

\[ \begin{array}{cccc} & LO & ME & HI \\ OJ & \mu & \mu+\beta_2 & \mu+ \beta_3\\ VC & \mu +\beta_1 & \mu+\beta_1 + \beta_2 & \mu+ \beta_1 + \beta_3\\ \end{array} \]

Main effect model in R

MainEff = smf.ols('len ~ supp + dose', data=ToothGrowth).fit()
MainEff.summary(slim = True)
OLS Regression Results
Dep. Variable: len R-squared: 0.762
Model: OLS Adj. R-squared: 0.750
No. Observations: 60 F-statistic: 59.88
Covariance Type: nonrobust Prob (F-statistic): 1.78e-17
coef std err t P>|t| [0.025 0.975]
Intercept 12.4550 0.988 12.603 0.000 10.475 14.435
supp[T.VC] -3.7000 0.988 -3.744 0.000 -5.680 -1.720
dose[T.ME] 9.1300 1.210 7.543 0.000 6.705 11.555
dose[T.HI] 15.4950 1.210 12.802 0.000 13.070 17.920


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Testing effect of supp

doseEff = smf.ols('len ~ dose', data=ToothGrowth).fit()
from statsmodels.stats.anova import anova_lm

anova_lm(doseEff, MainEff)
##    df_resid       ssr  df_diff  ss_diff          F    Pr(>F)
## 0      57.0  1025.775      0.0      NaN        NaN       NaN
## 1      56.0   820.425      1.0   205.35  14.016638  0.000429

Testing effect of dose

suppEff = smf.ols('len ~ supp', data=ToothGrowth).fit()
anova_lm(suppEff, MainEff)
##    df_resid          ssr  df_diff      ss_diff          F        Pr(>F)
## 0      58.0  3246.859333      0.0          NaN        NaN           NaN
## 1      56.0   820.425000      2.0  2426.434333  82.810935  1.871163e-17

Interaction

Example

means = ToothGrowth.groupby(['dose', 'supp'], observed=False)['len'].mean().unstack()
means.plot(marker='o')

Dummy coding

\[ \begin{array}{cccc} & LO & ME & HI \\ OJ & \mu & \mu+\beta_2 & \mu+ \beta_3\\ VC & \mu +\beta_1 & \mu+\beta_1 + \beta_2 +\beta_4 & \mu+ \beta_1 + \beta_3 + \beta_5\\ \end{array} \]

Example

Interaction = smf.ols('len ~ supp*dose', data=ToothGrowth).fit()
anova_lm(MainEff, Interaction)
##    df_resid      ssr  df_diff  ss_diff         F   Pr(>F)
## 0      56.0  820.425      0.0      NaN       NaN      NaN
## 1      54.0  712.106      2.0  108.319  4.106991  0.02186
Interaction.summary(slim = True)
OLS Regression Results
Dep. Variable: len R-squared: 0.794
Model: OLS Adj. R-squared: 0.775
No. Observations: 60 F-statistic: 41.56
Covariance Type: nonrobust Prob (F-statistic): 2.50e-17
coef std err t P>|t| [0.025 0.975]
Intercept 13.2300 1.148 11.521 0.000 10.928 15.532
supp[T.VC] -5.2500 1.624 -3.233 0.002 -8.506 -1.994
dose[T.ME] 9.4700 1.624 5.831 0.000 6.214 12.726
dose[T.HI] 12.8300 1.624 7.900 0.000 9.574 16.086
supp[T.VC]:dose[T.ME] -0.6800 2.297 -0.296 0.768 -5.285 3.925
supp[T.VC]:dose[T.HI] 5.3300 2.297 2.321 0.024 0.725 9.935


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Hierarchical principle