Chi-square and ordinal tests

The ASTA team

Contingency tables

A contingency table

popKids <- read.delim("https://asta.math.aau.dk/datasets?file=PopularKids.txt")
library(mosaic)
tab <- tally(~Urban.Rural + Goals, data = popKids, margins = TRUE)
tab
##            Goals
## Urban.Rural Grades Popular Sports Total
##    Rural        57      50     42   149
##    Suburban     87      42     22   151
##    Urban       103      49     26   178
##    Total       247     141     90   478

A conditional distribution

##            Goals
## Urban.Rural Grades Popular Sports   Sum
##    Rural     0.383   0.336  0.282 1.000
##    Suburban  0.576   0.278  0.146 1.000
##    Urban     0.579   0.275  0.146 1.000
##    Total     0.517   0.295  0.188 1.000

Independence

##            Goals
## Urban.Rural Grades Popular Sports
##    Rural       0.5     0.3    0.2
##    Suburban    0.5     0.3    0.2
##    Urban       0.5     0.3    0.2

The Chi-squared test for independence

tab <- tally(~Urban.Rural + Goals, data = popKids)
n <- margin.table(tab)
pctGoals <- round(margin.table(tab, 2) / n, 3)
pctGoals
## Goals
##  Grades Popular  Sports 
##   0.517   0.295   0.188
##            Goals
## Urban.Rural Grades        Popular       Sports        Sum          
##    Rural     77.0 (0.517)  44.0 (0.295)  28.1 (0.188) 149.0 (1.000)
##    Suburban  78.0 (0.517)  44.5 (0.295)  28.4 (0.188) 151.0 (1.000)
##    Urban     92.0 (0.517)  52.5 (0.295)  33.5 (0.188) 178.0 (1.000)
##    Sum      247.0 (0.517) 141.0 (0.295)  90.0 (0.188) 478.0 (1.000)

Calculation of expected table

pctexptab
##            Goals
## Urban.Rural Grades        Popular       Sports        Sum          
##    Rural     77.0 (0.517)  44.0 (0.295)  28.1 (0.188) 149.0 (1.000)
##    Suburban  78.0 (0.517)  44.5 (0.295)  28.4 (0.188) 151.0 (1.000)
##    Urban     92.0 (0.517)  52.5 (0.295)  33.5 (0.188) 178.0 (1.000)
##    Sum      247.0 (0.517) 141.0 (0.295)  90.0 (0.188) 478.0 (1.000)

Chi-squared (\(\chi^2\)) test statistic

tab
##            Goals
## Urban.Rural Grades Popular Sports
##    Rural        57      50     42
##    Suburban     87      42     22
##    Urban       103      49     26
##            Goals
## Urban.Rural Grades Popular Sports Sum  
##    Rural     77.0   44.0    28.1  149.0
##    Suburban  78.0   44.5    28.4  151.0
##    Urban     92.0   52.5    33.5  178.0
##    Sum      247.0  141.0    90.0  478.0

\(\chi^2\)-test template.

1 - pdist("chisq", 18.8, df = 4)

## [1] 0.0008603303

The function chisq.test

tab <- tally(~ Urban.Rural + Goals, data = popKids)
testStat <- chisq.test(tab, correct = FALSE)
testStat
## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 18.828, df = 4, p-value = 0.0008497
testStat$expected
##            Goals
## Urban.Rural   Grades  Popular   Sports
##    Rural    76.99372 43.95188 28.05439
##    Suburban 78.02720 44.54184 28.43096
##    Urban    91.97908 52.50628 33.51464
data <- c(57, 87, 103, 50, 42, 49, 42, 22, 26)
tab <- matrix(data, nrow = 3, ncol = 3)
row.names(tab) <- c("Rural", "Suburban", "Urban")
colnames(tab) <- c("Grades", "Popular", "Sports")
tab
##          Grades Popular Sports
## Rural        57      50     42
## Suburban     87      42     22
## Urban       103      49     26
chisq.test(tab)
## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 18.828, df = 4, p-value = 0.0008497

The \(\chi^2\)-distribution

Summary

Residual analysis

Residual analysis in R

tab <- tally(~ Urban.Rural + Goals, data = popKids)
testStat <- chisq.test(tab, correct = FALSE)
testStat$stdres
##            Goals
## Urban.Rural     Grades    Popular     Sports
##    Rural    -3.9508449  1.3096235  3.5225004
##    Suburban  1.7666608 -0.5484075 -1.6185210
##    Urban     2.0865780 -0.7274327 -1.8186224

Cramér’s V

library(DescTools)
CramerV(tab, conf = 0.95, type = "perc")                         
##   Cramer V     lwr.ci     upr.ci 
## 0.14033592 0.06014641 0.19419139

Ordinal variables

Association between ordinal variables

  VeryD LittleD ModerateS VeryS
< 15k 1 3 10 6
15-25k 2 3 10 7
25-40k 1 6 14 12
> 40k 0 1 9 11

Gamma coefficient

\(C=\) the number of concordant pairs in our sample.

\(D=\) the number of disconcordant pairs in our sample.

Gamma coefficient

Example

library(vcdExtra)
JobSat
##         satisfaction
## income   VeryD LittleD ModerateS VeryS
##   < 15k      1       3        10     6
##   15-25k     2       3        10     7
##   25-40k     1       6        14    12
##   > 40k      0       1         9    11
GKgamma(JobSat, level = 0.90)
## gamma        : 0.221 
## std. error   : 0.117 
## CI           : 0.028 0.414

Validation of data

Goodness of fit test

Example

Goodness of fit test

\[X^2=\sum_{i=1}^k\frac{(O_i-E_i)^2}{E_i}\]

Example

k <- 4
pi_vector <- c(0.3, 0.2, 0.25, 0.25)
O_vector <- c(74, 72, 40, 61) 
n <- sum(O_vector)
E_vector <- n * pi_vector
E_vector
## [1] 74.10 49.40 61.75 61.75
Xsq = sum((O_vector - E_vector)^2 / E_vector)
Xsq
## [1] 18.00945
p_value <- 1 - pchisq(Xsq, df = k-1)
p_value
## [1] 0.0004378808

Test in R

Xsq_test <- chisq.test(O_vector, p = pi_vector)
Xsq_test
## 
##  Chi-squared test for given probabilities
## 
## data:  O_vector
## X-squared = 18.009, df = 3, p-value = 0.0004379
Xsq_test$stdres
## [1] -0.01388487  3.59500891 -3.19602486 -0.11020775