---
title: "Experiments with random numbers"
author: "" 
date: ""
output:
  html_document:
    fig_height: 3
    fig_width: 5
  pdf_document:
    fig_height: 3
    fig_width: 5
  word_document:
    fig_height: 3
    fig_width: 5
---

```{r, setup, include=FALSE}
require(mosaic)   # Load additional packages here 
# Some customization.  You can alter or delete as desired (if you know what you are doing).
#set.seed=1000, set a seed if you want to have reproducible results
#trellis.par.set(theme=theme.mosaic()) # change default color scheme for lattice
knitr::opts_chunk$set(
  tidy=FALSE,     # display code as typed
  size="small")   # slightly smaller font for code
```

## Coin flip

Make a variable `coin` corresponding to 1000 fair coin flips using `rbinom` by the following command:
```{r}
coin <- rbinom(1000, 1, 0.5)
```

The command `cumsum(coin)` successively sums up the values in the vector `coin`.  Create a variable `cumsumcoin` by `cumsumcoin <- cumsum(coin)` and inspect the first 10 entries in `coin` and `cumsumcoin` by the commands `coin[1:10]` and `cumsumcoin[1:10]`.
```{r }
# Delete this line and add the correct code yourself
```

The command `x=1:1000` generates a vector of integers from 1
to 1000. Therefore `y=cumsumcoin/x` corresponds to the relative
frequency of ones through the vector `cumsumcoin`. Plot
`y` and add a horizontal red line at 0.5 on the
y-axis (hint: use `gf_point(y~x) %>% gf_hline(yintercept = ~0.5, col  = "red")`).
Discuss the look of the curve compared to your expectations.
```{r }
# Delete this line and add the correct code yourself
```

## Uniform random numbers

Make a variable with 1000 random numbers drawn in the interval from 0 to 1:
```{r}
rand1 <- runif(1000, 0, 1)
```

Make a histogram of the variable. Try to change the arguments `bins`, `fill` and `color` in the `gf_histogram` command.
```{r }
# Delete this line and add the correct code yourself
```

The histogram probably doesn't look like a normal distribution at all. 
Convince yourselves that the theoretical frequency curve - i.e. the density function - is a horizontal line.

Convince yourselves that the population mean (expected value) is 1/2.

It can be shown that the population standard deviation is approximately 0.289.
How do these theoretical values fit with your empirical quantities? (Use the commands `sd()` and `mean()`.)
```{r }
# Delete this line and add the correct code yourself
```

Make two extra random variables `rand2` and `rand3` like `rand1` above.
```{r }
# Delete this line and add the correct code yourself
```

Make a new variable `mean12` with the average of random variables 1 and 2:
```{r}
# Use code like this: mean12 <- (rand1 + rand2) / 2
```

Make a histogram for this variable.
```{r }
# Delete this line and add the correct code yourself
```

The histogram probably looks more like a normal distribution curve.
It can be shown that the theoretical frequency curve - i.e. the density function - is a triangle.
Convince yourselves that the population mean (expected value) is 1/2.
Convince yourselves that the population standard deviation is approximately 0.289 divided by the square root of 2 (remember the CLT).

How does this fit with your empirical quantities?
```{r }
# Delete this line and add the correct code yourself
```
Make a new variable `mean123` with the average of the three random variables. Draw the histogram. 
```{r }
# Delete this line and add the correct code yourself
```

Hopefully this illustrates that when variables are averaged they tend to approach a normal distribution. (see above)
This is one of the reasons that the normal distribution by far is the most used distribution to describe measurement data.

## Quantile comparison plots

Another way of comparing a sample to the normal distribution is by a quantile comparison plot (QQ-plot).
A QQ-plot for the `rand1` variable is made like this (try arguments `col`, `alpha`and `lwd`):
```{r}
gf_qq(~rand1) %>% gf_qqline()
```

For a normally distributed sample this plot should look approximately linear. 
Try this for `mean12` and `mean123`.
```{r }
# Delete this line and add the correct code yourself
```

Finally, make a variable which is a sample from a normal distribution with mean 0 and standard deviation 1:
```{r}
x <- rnorm(1000, mean = 0, sd = 1)
```

Make the QQ-plot.
```{r }
# Delete this line and add the correct code yourself
```
