The ASTA team
y_canteen <- c(2, 5, 1, 6, 1, 1, 1, 1, 3, 4, 1, 2, 1, 2, 2, 2, 4, 2, 2, 5, 20, 2, 1, 1, 1, 1)
x_canteen <- ifelse(y_canteen > 2, 1, 0)
x_canteen
## [1] 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0
## [1] 0.2692308
x
is a vector with the first 2,000 outcomes of John Kerrich’s experiment (0 = tail, 1 = head):## [1] 0 0 0 1 1 1 0 1 0 0
(The horizontal axis is on a log scale).
We conduct the experiment \(n\) times. Let \(\#(A)\) denote how many times we observe the event \(A\).
Empirical probability of the event \(A\): \[ p_n(A)=\frac{\#(A)}{n}. \]
Theoretical probability of the event \(A\): \[ P(A)=\lim_{n\to\infty}p_n(A) \]
We always have \(0\leq P(A)\leq 1\).
Examples:
Example 1 Tossing a coin once. The sample space is S = {H, T}. E = {H} is an event.
Example 2 Tossing a die. The sample space is S = {1, 2, 3, 4, 5, 6}. E = {2, 4, 6} is an event, which can be described in words as the number is even.
Example 3 Tossing a coin twice. The sample space is S = {HH, HT, TH, TT}. E = {HH, HT} is an event, which can be described in words as the first toss results in a Heads.
Example 4 Tossing a die twice. The sample space is S = {(i, j) : i, j = 1, 2, …, 6}, which contains 36 outcomes. The sum of the results of the two tosses is equal to 10 is an event.
magAds <- read.delim("https://asta.math.aau.dk/datasets?file=magazineAds.txt")
# Create two new factors 'words' and 'education':
magAds$words <- cut(magAds$WDS, breaks = c(31, 72, 146, 230), include.lowest = TRUE)
magAds$education <- factor(magAds$GROUP, levels = c(1, 2, 3), labels = c("high", "medium", "low"))
library(mosaic)
tab <- tally( ~ words + education, data = magAds)
tab
## education
## words high medium low
## [31,72] 4 6 5
## (72,146] 5 6 8
## (146,230] 9 6 5
The event \(A\)=\(\{\)words=(146,230]\(\}\) (the ad is a “difficult” text) has empirical probability \[ p_n(A) = \frac{9 + 6 + 5}{54} = \frac{20}{54} \approx 37 \%.\]
Say we only are interested in the probability of a “difficult” text (event \(A\)) for high education magazines, i.e. conditioning on the event \(B=\{\)education=high\(\}\). Then the empirical conditional probability can be calculated from the table:
\[ p_n(A \mid B) = \frac{9}{4+5+9} = \frac{9}{18} = 0.5 = 50\%. \]
The conditional probability of \(A\) given \(B\) may theoretically be expressed as
\[ \begin{aligned} P(A \mid B) &= P(\text{words} =(146,230] \mid \text{education = high}) \\[0.5em] &= \frac{P(\text{words} =(146,230] \text{ and } \text{education = high})}{P(\text{education = high})}, \\ \end{aligned} \] which translated to empirical probabilities (substituting \(P\) with \(p_n\)) will give
\[ \begin{aligned} p_n(A \mid B) &= \frac{p_n(\text{words} =(146,230] \text{ and } \text{education = high})}{p_n(\text{education = high})} \\ &= \frac{\frac{9}{54}}{\frac{4+5+9}{54}} \\ &= \frac{9}{4+5+9} \\[0.5em] &= 50\% \end{aligned} \] as calculated above.
# Table with the percentage of ads in each combination of the levels of 'words' and 'education'
tab <- tally( ~ words + education, data = magAds, format = "percent")
round(tab, 2) # Round digits
## education
## words high medium low
## [31,72] 7.41 11.11 9.26
## (72,146] 9.26 11.11 14.81
## (146,230] 16.67 11.11 9.26
words
and education
) make up the whole sample space for the two variables. The empirical probabilities of each event is given in the table.Random/stochastic variable: A function \(Y\) that translates an outcome of the experiment into a number.
Possible outcomes in an experiment with 3 coin tosses:
The above events are disjoint and make up the whole sample space.
Let \(Y\) be the number of heads in the experiment: \(Y(TTT) = 0, Y(HTT) = 1, \ldots\)
Assume that each outcome is equally likely, i.e. probability 1/8 for each event. Then,
So, the distribution of \(Y\) is
Number of heads, \(Y\) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Probability | 1/8 | 3/8 | 3/8 | 1/8 |
We conduct an experiment \(n\) times, where the outcome of the \(i\)th experiment corresponds to a measurement of a random variable \(Y_i\), where we assume
Population | Sample |
---|---|
\(\mu\) | \(\overline{y}\) |
\(\sigma\) | \(s\) |
y (number of heads) | 0 | 1 | 2 | 3 |
---|---|---|---|---|
\(P(Y = y)\) | 1/8 | 3/8 | 3/8 | 1/8 |
Then the expected value is
\[ \mu = 0\frac{1}{8}+1\frac{3}{8}+2\frac{3}{8}+3\frac{1}{8}=1.5. \]
Note that the expected value is not a possible outcome of the experiment itself.
The distribution of the random variable ‘number of heads in 3 coin flops’ has variance \[ \sigma^2 = (0-1.5)^2\frac{1}{8} + (1-1.5)^2\frac{3}{8} + (2-1.5)^2 \frac{3}{8} + (3-1.5)^2 \frac{1}{8} = 0.75. \]
and standard deviation \[ \sigma = \sqrt{\sigma^2} = \sqrt{0.75} = 0.866. \]
Interpretation of standard deviation:
pdist
always outputs the area to the left of the \(z\)-value (quantile/percentile) we give as input (variable q
in the function), i.e. it outputs the probability of getting a value less than \(z\). The first argument of pdist
denotes the distribution we are considering.# For a standard normal distribution the probability of getting a value less than 1 is:
left_prob <- pdist("norm", q = 1, mean = 0, sd = 1)
## [1] 0.8413447
So q=1
corresponds to the 0.841-percentile/quantile for the standard normal distribution
## [1] 0.1586553
## [1] -2.575829
## [1] 2.575829
The Stanford-Binet Intelligence Scale is calibrated to be approximately normal with mean 100 and standard deviation 16.
What is the 99-percentile of IQ scores?
qdist
).We are given a sample \(y_1,y_2,\ldots,y_n\).
The sample mean \(\bar{y}\) is the most common estimate of the population mean \(\mu\).
The sample standard deviation, \(s\), is the most common estimate of the population standard deviation \(\sigma\).
We notice that there is an uncertainty (from sample to sample) connected to these statistics and therefore we are interested in describing their distribution.
## [1] 0.001750157