Probability

The ASTA team

Probability of events

The concept of probability

Actual experiment

y_canteen <- c(2, 5, 1, 6, 1, 1, 1, 1, 3, 4, 1, 2, 1, 2, 2, 2, 4, 2, 2, 5, 20, 2, 1, 1, 1, 1)
x_canteen <- ifelse(y_canteen > 2, 1, 0)
x_canteen
##  [1] 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0
p_canteen <- sum(x_canteen) / length(x_canteen)
p_canteen
## [1] 0.2692308

Another experiment

head(x, 10)
##  [1] 0 0 0 1 1 1 0 1 0 0

(The horizontal axis is on a log scale).

Definitions

We conduct the experiment \(n\) times. Let \(\#(A)\) denote how many times we observe the event \(A\).

Theoretical probabilites of two events

Conditional probability

Example with magazine data:

magAds <- read.delim("https://asta.math.aau.dk/datasets?file=magazineAds.txt")

# Create two new factors 'words' and 'education':
magAds$words <- cut(magAds$WDS, breaks = c(31, 72, 146, 230), include.lowest = TRUE)
magAds$education <- factor(magAds$GROUP, levels = c(1, 2, 3), labels = c("high", "medium", "low"))

library(mosaic)
tab <- tally( ~ words + education, data = magAds)
tab
##            education
## words       high medium low
##   [31,72]      4      6   5
##   (72,146]     5      6   8
##   (146,230]    9      6   5

\[ p_n(A \mid B) = \frac{9}{4+5+9} = \frac{9}{18} = 0.5 = 50\%. \]

\[ \begin{aligned} P(A \mid B) &= P(\text{words} =(146,230] \mid \text{education = high}) \\[0.5em] &= \frac{P(\text{words} =(146,230] \text{ and } \text{education = high})}{P(\text{education = high})}, \\ \end{aligned} \] which translated to empirical probabilities (substituting \(P\) with \(p_n\)) will give

\[ \begin{aligned} p_n(A \mid B) &= \frac{p_n(\text{words} =(146,230] \text{ and } \text{education = high})}{p_n(\text{education = high})} \\ &= \frac{\frac{9}{54}}{\frac{4+5+9}{54}} \\ &= \frac{9}{4+5+9} \\[0.5em] &= 50\% \end{aligned} \] as calculated above.

Conditional probability and independence

Magazine data revisited

Discrete distribution

Example: Magazine data

# Table with the percentage of ads in each combination of the levels of 'words' and 'education'
tab <- tally( ~ words + education, data = magAds, format = "percent")
round(tab, 2) # Round digits
##            education
## words        high medium   low
##   [31,72]    7.41  11.11  9.26
##   (72,146]   9.26  11.11 14.81
##   (146,230] 16.67  11.11  9.26

General discrete distribution

Example: 3 coin tosses

Number of heads, \(Y\) 0 1 2 3
Probability 1/8 3/8 3/8 1/8

Distribution of general random variables

Probability distribution

Sample

We conduct an experiment \(n\) times, where the outcome of the \(i\)th experiment corresponds to a measurement of a random variable \(Y_i\), where we assume

Population parameters

Population Sample
\(\mu\) \(\overline{y}\)
\(\sigma\) \(s\)

Distribution of a discrete random variable

Expected value (mean) for a discrete distribution

Example: number of heads in 3 coin flips

\[ \mu = 0\frac{1}{8}+1\frac{3}{8}+2\frac{3}{8}+3\frac{1}{8}=1.5. \]

Note that the expected value is not a possible outcome of the experiment itself.

Variance and standard deviation for a discrete distribution

Example: number of heads in 3 coin flips

The distribution of the random variable ‘number of heads in 3 coin flops’ has variance \[ \sigma^2 = (0-1.5)^2\frac{1}{8} + (1-1.5)^2\frac{3}{8} + (2-1.5)^2 \frac{3}{8} + (3-1.5)^2 \frac{1}{8} = 0.75. \]

and standard deviation \[ \sigma = \sqrt{\sigma^2} = \sqrt{0.75} = 0.866. \]

The binomial distribution

# The binomial distribution with n = 10 and pi = 0.35:
plotDist("binom", size = 10, prob = 0.35, 
         ylab = "Probability", xlab = "Number of successes", main = "binom(n = 10, prob = 0.35)")

Distribution of a continuous random variable

Density function

Increasing number of observations

Density shapes

Normal distribution

Reach of the normal distribution

Interpretation of standard deviation:

Normal \(z\)-score

Calculating probabilities in the standard normal distribution

# For a standard normal distribution the probability of getting a value less than 1 is:
left_prob <- pdist("norm", q = 1, mean = 0, sd = 1)

left_prob
## [1] 0.8413447
right_prob <- 1 - left_prob
right_prob
## [1] 0.1586553

Calculating \(z\)-values (quantiles) in the standard normal distribution

left_z <- qdist("norm", p = 0.005, mean = 0, sd = 1, xlim = c(-4, 4))

left_z
## [1] -2.575829
right_z <- qdist("norm", p = 1-0.005, mean = 0, sd = 1, xlim = c(-4, 4))

right_z
## [1] 2.575829

Example

The Stanford-Binet Intelligence Scale is calibrated to be approximately normal with mean 100 and standard deviation 16.

What is the 99-percentile of IQ scores?

Distribution of sample statistic

Estimates and their variability

We are given a sample \(y_1,y_2,\ldots,y_n\).

We notice that there is an uncertainty (from sample to sample) connected to these statistics and therefore we are interested in describing their distribution.

Distribution of sample mean

Central limit theorem

Illustration of CLT

The central limit theorem is illustrated in Agresti:

Central limit theorem illustrated in Agresti

Example

1 - pdist("norm", mean = 0, sd = 1, q = 2.92, xlim = c(-4, 4)) 

## [1] 0.001750157