Data collection and wrangling

The ASTA team

Data collection

Data collection

Ronald Fisher (1890-1962):

To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.

Said about Fisher:

Data collection

Population and sample

Population and sample

Sample 3 of size \(n = 30\):

shape color n_sample p_sample p_pop p_diff
baby black 2 0.07 0.04 -0.03
baby blue 1 0.03 0.04 0.01
baby red 0 0.00 0.01 0.01
man black 5 0.17 0.12 -0.05
man blue 8 0.27 0.22 -0.05
man red 3 0.10 0.08 -0.02
woman black 3 0.10 0.23 0.13
woman blue 8 0.27 0.22 -0.05
woman red 0 0.00 0.02 0.02

Sample bias and non response bias

Example: United States presidential election, 1936

(Based on Agresti, this and this.)

Example: United States presidential election, 1936

Example: United States presidential election, 1936

Survivership bias

Example: Bullet holes of honor

(Based on this.)

Example: Bullet holes of honor

Response bias

Example: New York Times/CBS News poll on attitude to increased fuel taxes

Example: Order of questions matter

US study during cold war asked two questions:

1 “Do you think that US should let Russian newspaper reporters come here and sent back whatever they want?”

2 “Do you think that Russia should let American newspaper reporters come in and sent back whatever they want?”

The percentage of yes to question 1 was 36%, if it was asked first and 73%, when it was asked last.

Theory: Biases / sampling

Biases

Agresti section 2.3:

Sampling

Agresti section 2.4:

Data wrangling

Data wrangling

This will be illustrated with two specific cases.

The material is on Moodle.