---
title: "Rmarkdown intro"
author: "The ASTA Team"
output:
  html_document:
    fig_height: 3
    fig_width: 5
    theme: cerulean
    highlight: tango
  pdf_document:
    fig_height: 3
    fig_width: 5
---

# Data 


## Reading Data
We shall now consider a specific dataset called `BrainSize`.
We load it into R by the command
```{r}
BrainSize = read.delim("https://asta.math.aau.dk/datasets?file=BrainSize.txt")
```

The `BrainSize` dataset is now of a form called a data frame in R; you can think of it as an excel sheet with variables as columns.

## Data frames
To get an overview (the first six rows) of a data frame you can use the `head()` function:
```{r}
head(BrainSize)
```
We see that the `BrainSize` data set consists of 7 columns:

- `Gender`
- `FSIQ`
- `VIQ`
- `PIQ`
- `Weight`
- `Height`
- `MRI_Count`
- `HeightIntervals`: The Height variable divided into 5 intervals


# Using Add-on Packages (Mosaic)
R comes with a rich set of pre-installed functions like `mean`, `sum`, `plot` and many more.
We can also install new "packages" with additional functionalities.
Throughout this course, we shall rely heavily on the package `mosaic`.
It can be installed be typing `install.packages("mosaic")` - you only have to do this the very first time.
Whenever we want to use a function within the packages we need to load it.
We only do this one time in the Rmarkdown document; and usually at the very beginning of the document.
The command is `library(mosaic)`.

In the rest of this tutorial, we shall develop an understanding of the usage of the `mosaic` package.
You can look up important `mosaic` functions in the cheat-sheet or refer to this document.

Many functions from `mosaic` have the form `goal(y ~ x | z, data = mydata, ...)`.
For plots:

- `y`: is the y-axis variable
- `x`: is the x-axis variable
- `z`: conditioning variable (separate panels)

For other things: `y ~ x | z` can usually be read `y` is modeled by (or depends on) `x` differently for each `z`.

# Descriptive statistics of qualitative variables 

## Tabulation
Recall the `BrainSize` data which we have already loaded into R. We can use the `tally` function from the `mosaic` package to summarize the Gender variable:
```{r, message=FALSE}
# The option "message = FALSE" will prevent R from printing information about the package.
library(mosaic) # The functionalities in the mosaic package are now available 
tally( ~ Gender, data = BrainSize)
```

What we see is, that in this data set $20$ observations are Females and $18$ observations are Males. We can also make a cross tabulation of `Gender` and `HeightIntervals` (remember that HeightIntervals is a categorical variable with five levels):
```{r}
tally( ~ Gender + HeightIntervals, data = BrainSize)
```
With this command we simply "model" (count) `Gender` and `HeightIntervals` together. We see, that males are in general higher than females (for this data set).
- To swap rows and columns swap the variables.
- To get the relative frequencies (of the total observations) add `format = "percent"`:

```{r}
tally( ~ HeightIntervals + Gender, data = BrainSize, format = "percent")
```

There is also an option to add totals in the "margins" of the table:

```{r}
tally( ~ HeightIntervals + Gender, data = BrainSize, format = "percent", margins = TRUE)
```


To get relative frequencies for each gender (across columns) specify that you want `HeightIntervals` "modeled" (counted) by `Gender`:

```{r}
tally(HeightIntervals ~ Gender, data = BrainSize, format = "percent")
```

## Bar graphs

To make a bar graph for the gender distribution in the sample:

```{r}
gf_bar( ~ Gender, data = BrainSize)
```

## Exercises

- Make a table of the height distribution in the sample.
- Make a bar graph of the variable `HeightIntervals`.
- Make a bar graph of the variable `HeightIntervals` for each gender by editing the following R chunk:

```{r}
# gf_bar( ~ ...| ..., data = BrainSize)
```