Monte Carlo simulations
The ASTA team
Basic principle
Monte Carlo (MC) estimation of the probability of event \(A\):
- Run an experiment \(N\) times, and count how many times \(A\) occurred, \(N_A\), say
\[
P(A) \approx \frac{N_A}{N}
\]
(Frequentist)
Many, many simulations
Letting the number of experiments approach infinity, \(\approx\) becomes \(=\):
\[
P(A) = \lim_{N \rightarrow \infty} \frac{N_A}{N}
\]
Example 1
Recall (lecture 1.2):
John Kerrich, a South African mathematician, was visiting Copenhagen when World War II broke out. Two days before he was scheduled to fly to England, the Germans invaded Denmark. Kerrich spent the rest of the war interned at a camp in Hald Ege near Viborg, Jutland. To pass the time he carried out a series of experiments in probability theory. In one, he tossed a coin 10,000 times.
The first 25 observations were (0 = tail, 1 = head): \[ 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 0, 1, 1, 0, 1, 0,\ldots\]
Plot of the empirical probability \(\hat{p}\) of getting a head against the number of tosses \(n\):
(The horizontal axis is on a log scale).
(Pseudo) random number generation
## [1] 9 2
## [1] 10 1
set.seed(48)
sample(1:10, 2)
## [1] 5 3
set.seed(48)
sample(1:10, 2)
## [1] 5 3
Example 1 - cont’d
Let the computer toss the coin (MC simulation):
set.seed(48)
x2 <- sample(c(0, 1), 10000, replace = TRUE, prob = c(0.5, 0.5))
head(x2)
## [1] 0 1 1 1 1 1
- Black: John Kerrich
- Red: MC
Example 2
Recall (lection 2.1):
- A company produces cylindrical components for automobiles. It is important that the mean component diameter is \(\mu=5\)mm. The standard deviation is \(\sigma=0.1\)mm.
- An engineer takes a random sample of \(n=100\) components. These have an average diameter of \(\bar{x}=5.027\). Is it reasonable to think \(\mu=5\)?
- If the population of components has the correct mean, then
\[\bar{X} \approx \texttt{norm}\left(\mu,\frac{\sigma}{\sqrt{n}}\right)=\texttt{norm}\left(5,\frac{0.1}{\sqrt{100}}\right)=\texttt{norm}(5,0.01).\]
- For the actual sample this gives the observed \(z\)-score \[z_{obs}=\frac{\bar{x}-5}{0.01}=2.7\] which should come from an approximate standard normal distribution.
- The probability of getting a higher \(z\)-score is:
prob <- 1 - pdist("norm", mean = 0, sd = 1, q = 2.7, xlim = c(-4, 4))
## [1] 0.0035
- Thus, it is highly unlikely that a random sample has such a high \(z\)-score. A better explanation might be that the produced components have a population mean larger than 5mm.
Use MC simulation:
set.seed(48)
N <- 10000
result <- rnorm(n = N, mean = 5, sd = 0.1/sqrt(100))
sprintf("%0.5f", head(result))
## [1] "5.00200" "4.97220" "4.99304" "5.02075" "5.00790" "5.00490"
## [1] 34
## [1] 0.0034
prob1 <- mean(result >= 5.027)
prob1
## [1] 0.0034
## [1] 0.0034
## [1] 0.0035
## [1] -6.7e-05
## [1] -0.019
More complicated simulations
sample()
rnorm()
replicate()
:
set.seed(48)
x2 <- sample(c(0, 1), 10000, replace = TRUE, prob = c(0.5, 0.5))
mean(x2)
## [1] 0.49
set.seed(48)
x2 <- replicate(N, {
sample(c(0, 1), 1, prob = c(0.5, 0.5))
})
mean(x2)
## [1] 0.49
Variance is important
- Average of 45M and 55M is 50M
- Average of 1M and 99M is 50M
Averages are not always sufficient. Think of budgets:
- On average the profit will be 50M
- But… What is the risk that it will become e.g. below 20M?
- …or 0 even?
- What is the probability that it is e.g. below 0 (deficit)?
Example 3: Bivariate reliability problem
Limit state function:
- defines failure criterion
- function of all random variables
- failure when \(g(\ldots) \leq 0\)
Model
\[
g(B, C) = B - C \leq 0
\]
- \(B \sim \mathcal{N}(100, 10)\): budget of a given project (in $1,000)
- \(C \sim \mathcal{N}(50, 10)\): expected costs (in $1,000)
Assess
\[
P(g(B, C) < 0)
\]
Simulations
set.seed(48)
N <- 100000
Bdata <- rnorm(N, mean = 100, sd = 10)
Cdata <- rnorm(N, mean = 50, sd = 10)
gdata <- Bdata - Cdata
## [1] 21
prob1 <- mean(gdata < 0)
prob1
## [1] 0.00021
Results
plot(Bdata, Cdata)
abline(a = 0, b = 1, col = "red", lwd = 3)
points(Bdata[gdata < 0], Cdata[gdata < 0], col = "red")
Theoretical answer
For the particular case, we can exploit that the sum of two normal distributions is also a normal distribution, and due to independence the variance is the sum of the two variances:
g_mean <- 100 - 50
g_var <- 10^2 + 10^2
prob <- pdist("norm", q = 0,
mean = g_mean,
sd = sqrt(g_var), plot = FALSE)
prob
## [1] 2e-04
## [1] -6.5e-06
## [1] -0.032
Sensitivity
- \(B \sim \mathcal{N}(100, 10)\): budget of a given project (in $1,000)
- \(C \sim \mathcal{N}(50, 10)\): expected costs (in $1,000)
- \(g(B, C) = B - C\)
How does \[
P(g(B, C) < 0)
\] depend on the variance/standard deviations?
Illustrating \(g(B, C) < 0\)
Area where red is larger than blue illustrates \(P(g(B, C) < 0)\):
plotDist("norm", mean = 100, sd = 10,
xlim = c(10, 175), col = "blue")
plotDist("norm", mean = 50, sd = 10,
add = TRUE, col = "red")
plotDist("norm", mean = 100, sd = 50,
xlim = c(10, 175), col = "blue")
plotDist("norm", mean = 50, sd = 50,
add = TRUE, col = "red")
sapply
, lapply
, for
loops
Using sapply
(other possibilities: lapply
, for
loops):
set.seed(48)
N <- 100000
sigmas <- seq(5, 50, length.out = 50)
probs <- sapply(sigmas, function(sigma) {
Bdata <- rnorm(N, mean = 100, sd = sigma)
Cdata <- rnorm(N, mean = 50, sd = sigma)
gdata <- Bdata - Cdata
mean(gdata < 0)
})
plot(sigmas, probs, type = "l")
abline(h = prob1)
abline(v = 10)
Example 4: Wholesales offer
- Newly established business with a innovative product
- Decided to apply for a patent
No patent
The expected sales for next year is $1-$9 mio. with $3 mio. being the most likely value.
Triangular distribution:
plotDist("tri", params = list(min = 1, max = 9, mode = 3))
With patent
Sales goes up 25%-75% with 50% percent being the most likely value
Triangular distribution:
plotDist("tri", params = list(min = 0.25, max = 0.75, mode = 0.5))
Simulating sales
set.seed(48)
N <- 10000
got_patent <- sample(x = c(0, 1), prob = c(0.6, 0.4), size = N, replace = TRUE)
table(got_patent)
## got_patent
## 0 1
## 5935 4065
sales_base <- rtri(N, min = 1, max = 9, mode = 3)
patent_markup <- rtri(N, min = 0.25, max = 0.75, mode = 0.5)
sales <- sales_base + got_patent*patent_markup*sales_base
sales_avg <- mean(sales)
sales_avg
## [1] 5.2
quantile(sales, probs = c(0.025, 0.975))
## 2.5% 97.5%
## 1.8 10.9
Wholesale offer
Wholesale offer to buy your entire production for the next year for $6 mio.
Take the deal?
Many, many aspects:
- Willing to take a risk?
- Future plans
Initially: How likely is it that we can sell the production for more ourselves?
prob <- mean(sales > 6)
prob
## [1] 0.33
Does not answer the question on its own, but it helps (e.g. in assessing the risk).