---
title: "Rmarkdown intro"
author: "The ASTA Team"
output:
  html_document:
    fig_height: 3
    fig_width: 5
    theme: cerulean
    highlight: tango
  pdf_document:
    fig_height: 3
    fig_width: 5
---


# Introduction Rmarkdown
First and foremost: Forget about everything above this line for now!!!

This is a brief introduction to how Rmarkdown works (for more details on using R Markdown see <http://rmarkdown.rstudio.com>) using a few simple examples.

Rmarkdown files consists of:

- A header: title, author, output format etc.
- Simple text like this (LateX users can exploit common LateX formulas like $\sqrt{9} = 3$ etc.).
    + Headings are created by hashtags; a single hashtag is the largest level.
- Code chunks where we execute R commands. Code chunks begins with the line ```{r} and ends with the line ```. Everything in between are R calculations.
- When you are done working on your project, you `knit` (the knit button is blue and is located at the top) the entire Rmarkdown file to either a pdf (requires that latex is installed on your computer) or a html file.

A first peak at R chunks:
```{r}
1+1 
```
The above chunk can be executed by hitting the green "play" button to the right of the chunk or simply placing the cursor at line with `1+1` and hit Ctrl+Enter (on Windows). The result is shown in the "Console".

### Exercise:
  - Knit this Rmarkdown by clicking the "knit" button. This will run all R chunks consecutive (one after one).
  - Insert a new chunk either by using Ctrl+Alt+I or by clicking the green Insert button at the top:
REMOVE THIS LINE AND INSERT A CHUNK
  - Once the chunk is inserted, calculate 1+2+3+4+5

Additionally Shortcuts can be found in the Help menu.

# R 

## R as a Calculator

- **R** can be used like a simple calculator:

```{r}
3-5
3*9
6/5
sqrt(9)
pi
3^2
log(100)
```

We can also include inline R code using backticks like this `r 1+4` and this `r sqrt(5)+3*7`.

## Variable Assignment

- Assigning values to variables can be done in several ways:
```{r}
a <- 99
b = 1
1.45 -> c
```
- If you hit the green button in the chunk above, you will see that the variables a,b and c are created in the environment pane (to the right). We can now work with these variables in subsequent chunks:
```{r}
a+b+c
```

- Some prefer to use `=` for assignment, but the traditional way is by using `<-` which most users use. You will most likely never use `->`.

- There are some restriction when assigning. For example the name of a variable can not start with a number.
For longer names it is good practice to separate words with an underscore or using capital letters like: `mean_of_men_height` or `meanOfMenHeight`. However, such a long name for a mean value is probably overkill.
    + You may benefit from reading a short guide on code style like http://r-pkgs.had.co.nz/style.html
    + Meaningful names make your code easier for others to read, but also for yourself if you have to read your code again next month.

- Make sure not to use "old" variable names (if a variable name is reused for a different value in the same document be cautious!)

- To print the value of a variable, just write the name followed by "Enter" if in a console or "Ctrl+Enter" if in the editor:

```{r}
a
b
c # This will print c in the final output
```

- In R (chunks), hashtags `#` allow comments and can be useful in reminding oneself of specific **R** code (i.e. what the code does).

- Finally; we can also use variables in inline calculations like `r a+b-50`.

### Exercise:
- Create the variables $x$ and $y$ and assign the value 23 to $x$ and 11 to $y$ and try to add these together by writing $x+y$.
```{r}
# x = ...
# y = ... 
# x + y
```

- Take the square root of y
```{r}
# sqrt(...)
```

- Divide x by y
```{r}
# ... / ...
```


# Data Structures - Vectors
## Types of Vectors
The basic data structure in R is the vector which can be created using the `c()` function (for concatenate).

```{r}
my_vector = c(1.3, 5, 3.5, 7)
my_vector
```


## Subsetting (indexing) Vectors

- We can access the elements of a vector using the `[` function:

```{r}
v = c(2.1, 4.2, 3.3, 5.4, 7)
v[3]
```
- We can do even more advanced indexing using
    + Positive integers: Extracting specific elements
    + Negative integers: Exclude specific elements

```{r}
v[c(1,3,5)]         # Positive
v[-c(1,3,5)]        # Negative
```

## Assignment

- We can change elements in a vector using subsetting:
```{r}
v2 = c(1, 2, 3)
v2[1] = 0
v2
```
- Another example:
```{r}
v3 = c(3,5,7,9,11)
v3[c(2,3)] = 0
v3
```

## Math Operations on Vectors
R uses vectorized operations (addition, multiplication and logarithms of vectors etc.):
```{r}
sum(v2)  # Sum of all elements in v2
sqrt(v2) # Square root of v2
v2^2     # Squaring all elements in v2
v2 + v2  # Add v2 to it self
```

### Exercise

- Use R to calculate the sum of the numbers $1,2,3,4,5,6,7,8,9,10$
- Extract the second and third element of `v4 = c(4,3,9,1,0)` (you need to create a new chunk below).

# Reading Data
We shall now consider a specific dataset called `BrainSize`.
We load it into R by the command
```{r}
BrainSize = read.delim("https://asta.math.aau.dk/datasets?file=BrainSize.txt")
```

The `BrainSize` dataset is now of a form called a data frame in R; you can think of it as an excel sheet with variables as columns.

## Dataframes
To get an overview (the first six rows) of a data frame you can use the `head()` function:
```{r}
head(BrainSize)
```
We see that the `BrainSize` dataset consists of 7 columns:

- `Gender`
- `FSIQ`
- `VIQ`
- `PIQ`
- `Weight`
- `Height`
- `MRI_Count`
- `HeightIntervals`: The Height variable divided into 5 intervals

We can extract columns (vectors) from a data frame with the `$` operator:
```{r}
height = BrainSize$Height # Extracting the 'Height' column from data frame 'BrainSize' and naming it 'height'
mean(height)
```

# Using Add-on Packages (Mosaic)
R comes with a rich set of pre-installed functions like `mean`, `sum`, `plot` and many more.
We can also install new "packages" with additional functionalities.
Throughout this course, we shall rely heavily on the package `mosaic`.
It can be installed be typing `install.packages("mosaic")` - you only have to do this the very first time.
Whenever we want to use a function within the packages we need to load it.
We only do this one time in the Rmarkdown document; and usually at the very beginning of the document.
The command is `library(mosaic)`.

In the rest of this tutorial, we shall develop an understanding of the usage of the `mosaic` package.
You can look up important `mosaic` functions in the cheat-sheet or refer to this document.

Many functions from `mosaic` have the form `goal(y ~ x | z, data = mydata, ...)`.
For plots:

- `y`: is the y-axis variable
- `x`: is the x-axis variable
- `z`: conditioning variable (separate panels)

For other things: `y ~ x | z` can usually be read `y` is modeled by (or depends on) `x` differently for each `z`.

## Tabulate
Recall the `BrainSize` data which we have already loaded into R. We can use the `tally` function from the `mosaic` package to summarize the Gender variable:
```{r, message=FALSE}
# The option "message = FALSE" will prevent R from printing information about the package.
library(mosaic) # The functionalities in the mosaic package are now available 
tally( ~ Gender, data = BrainSize)
```

What we see is, that in this data set $20$ observations are Females and $18$ observations are Males. We can also make a cross tabulation of `Gender` and `HeightIntervals` (remember that HeightIntervals is a categorical variable with five levels):
```{r}
tally( ~ Gender + HeightIntervals, data = BrainSize)
```
With this command we simply "model" (count) `Gender` and `HeightIntervals` together. We see, that males are in general higher than females (for this data set).
- To swap rows and columns swap the variables.
- To get the relative frequencies (of the total observations) add `format = "percent"`:

```{r}
tally( ~ HeightIntervals + Gender, data = BrainSize, format = "percent")
```

There is also an option to add totals in the "margins" of the table:

```{r}
tally( ~ HeightIntervals + Gender, data = BrainSize, format = "percent", margins = TRUE)
```


To get relative frequencies for each gender (across columns) specify that you want `HeightIntervals` "modeled" (counted) by `Gender`:

```{r}
tally(HeightIntervals ~ Gender, data = BrainSize, format = "percent")
```


## Numerical summaries
We can also summarize the `Height` by `Gender` numerically with the `favstats` function:

```{r}
favstats(Height ~ Gender, data = BrainSize)
```

As we have already guessed, the mean height of men is larger than the mean of women height.

We can also use the `mean` function to extract the means directly:

```{r}
mean(Height ~ Gender, data = BrainSize)
```

### Exercises

Consider the BrainSize data.

- What is the mean value of the Weight variable?
- What is the mean value of the Weight variable for each group of the Gender variable?
- How many females weigh $146$?

## Visualizing Data

### Boxplots

```{r}
gf_boxplot(Height ~ Gender, data = BrainSize)
```

```{r}
gf_boxplot(Weight ~ HeightIntervals | Gender, data = BrainSize)
```


### Exercises

Consider the BrainSize data.

- Use the function `gf_point` from `mosaic` to plot `Weight` against `Height` with a different color for each `Gender` (fill in the missing code).

```{r}
# gf_point( ... ~ ..., col = ~Gender, data = BrainSize)
```

- Do the same, but with separate plots for males and females (fill in the missing code).

```{r}
# gf_point( ... ~ ... | ..., data = BrainSize)
```

- Do you see a different picture for males and females?