Chapter 4 Creating Reports

Every serious scholar dreams of recording their knowledge in a leather-bound tome that will lie forgotten and dusty on the shelves of an out-of-the-way library at a small college with a dubious reputation until someone naïve enough to believe that they can control forces beyond mortal ken stumbles upon it and unleashes ravenous horrors to prey upon the sanity of the innocent. Sadly, in these diminished times we must settle for dry expositions of trivia typeset in two columns using a demure font and sequentially-numbered citations.

But there is yet hope. One of R’s greatest strengths is a package called knitr that translates documents written in a format called R Markdown into HTML, PDF, and e-books. R Markdown files are a kind of programmable document; authors can interleave prose with chunks of R (or other languages) and knitr will run that code as it processes the document to create tables and diagrams. To aid this, the RStudio IDE includes tools to create new documents, insert and run code chunks, preview documents’ structure and output, and much more.

That’s the good news. The bad news is that Markdown isn’t a standard: it’s more like a set of ad hoc implementations flying in loose formation. While basic elements like headings and links are more or less the same in R Markdown as they are in (for example) GitHub Flavored Markdown, people who have used other dialects may trip over small differences.

This chapter starts by introducing Markdown and publishing workflow, then shows how to embed code and customize reports. We close by showing how to publish reports on GitHub and Netlify, which both offer free hosting for small websites.

4.1 Learning Objectives

  • Create a Markdown file that includes headings, links, and external images.
  • Compile that file to produce an HTML page.
  • Customize the page via its YAML header.
  • Add R code chunks to a page.
  • Format tabular output programmatically.
  • Share common startup code between pages.
  • Create and customize parameterized documents.
  • Publish pages on GitHub by putting generated HTML in the docs directory.
  • Publish pages by copying them to Netlify.

4.2 How can I create and preview a simple page?

To begin, create a file called first.Rmd and add the following text to it:

``` –>

This shows several key features of Markdown:

  • A level-1 heading (h1 in HTML) is put on a line starting with #. A level-2 heading (h2) uses two of these and so on.
  • Putting {-} immediately after the heading title suppresses numbering. Without this, our examples would all be included in this book’s table of contents, which isn’t what we want. (This is one of R Markdown’s extensions to Markdown.)
  • Paragraphs are separated by blank lines.
  • Text can be put in single asterisks for italics or double asterisk for bold.
  • To create a link, put the visible text inside square brackets and the URL inside parentheses immediately after it. This is the reverse of HTML’s order, which puts the URL inside the opening a tag before the contained text that is displayed.
  • Links can also be written by putting text inside square brackets and an identifier immediately after it, also in square brackets. Those identifiers can then be associated with links in a table at the bottom of the page.
  • Comments are written as they are in HTML.

Here’s what the HTML corresponding to our simple Markdown document looks like:

–>

To preview it, go to the mini-toolbar at the top of your document in the RStudio IDE and click “knit”: to call the appropriate function from knitr (or a function from one of the libraries built on top of it for generating books or slides).

The 'knit' Button

Figure 4.1: The ‘knit’ Button

4.3 How can I run code and include its output in a page?

If this is all R Markdown could do, it would be nothing more than an idiosyncratic way to create HTML pages. What makes it powerful is the ability to include code chunks that are evaluated as the document is knit, and whose output is included in the final page. Put this in a file called second.Rmd:

Displaying the colors:

```{r}
colors <- c('red', 'green', 'blue')
colors
```

The triple back-quotes mark the start and end of a block of code; putting {r} immediately after the back-quotes at the start tells knitr to run the code and include its output in the generated page, which therefore looks like this:

Displaying the colors:

colors <- c('red', 'green', 'blue')
colors
## [1] "red"   "green" "blue"

We can put any code we want inside code blocks. We don’t have to execute it all at once: the Code pulldown in RStudio’s main menu offers a variety of ways to run regions of code. The IDE also gives us a keyboard shortcut to insert a new code chunk, so there really is no excuse for not making notes as we go along.

We can control execution and formatting by putting options inside the curly braces at the start of the code block:

  • {r label} gives the chunk a label that we can cross-reference. Labels must be unique within documents, just like the id attributes of HTML elements.
  • {r include=FALSE} tells knitr to run the code but not to include either the code or its output in the finished document. While the option name is confusing—the code is actually included in processing—this is handy when we have setup code that loads libraries or does other things that our readers probably don’t care about.
  • {r eval=FALSE} displays the code but doesn’t run it, and is often used for tutorials like this one.
  • {r echo=FALSE} hides the code but includes the output. This is most often used for displaying static images as we will see below.

These options can be combined by separating them with commas. In particular, it’s good style to give every chunk a unique label, so a document might look like this:

# My Thesis {-}

```{r setup, include=FALSE}
# Load tidyverse but don't display messages.
library(tidyverse)
```

```{r read-data, message=FALSE}
earthquakes <- read_csv('earthquakes.csv')
```

A profound quotation to set the scene.
And then some analysis:

```{r calculate-depth-by-magnitude}
depth_by_magnitude <- earthquakes %>%
  mutate(round_mag = round(Magnitude)) %>%
  group_by(round_mag) %>%
  summarize(depth = mean(Depth_Km))
depth_by_magnitude
```

Now let's visualize that:

```{r plot-depth-by-magnitude}
depth_by_magnitude %>%
  ggplot() +
  geom_point(mapping = aes(x = round_mag, y = depth))
```

In order:

  • The document title is a level-1 header with suppressed numbering.
  • The first code chunk is called setup and neither it nor its output are included in the output page.
  • The second chunk is called read-data. It is shown in the output, but its output is not.
  • There is then a (very) short paragraph.
  • The third code chunk calculates the mean depth by rounded magnitude. Both the code and its output are included; the output is just R’s textual display of the depth_by_magnitude table.
  • After an even shorter paragraph, there is another named chunk whose output is a plot rather than text. knitr runs ggplot2 to create the plot and includes it in the page.

When this page is knit, the result is:

My Thesis

earthquakes <- read_csv('earthquakes.csv')

A profound quotation to set the scene. And then some analysis:

depth_by_magnitude <- earthquakes %>%
  mutate(round_mag = round(Magnitude)) %>%
  group_by(round_mag) %>%
  summarize(depth = mean(Depth_Km))
depth_by_magnitude
## # A tibble: 5 x 2
##   round_mag depth
##       <dbl> <dbl>
## 1         2  9.85
## 2         3  9.15
## 3         4  8.64
## 4         5  8   
## 5         6  8.1

Now let’s visualize that:

depth_by_magnitude %>%
  ggplot() +
  geom_point(mapping = aes(x = round_mag, y = depth))

Note that if we want to include a static image (such as a screenshot) in a report, we can use Markdown’s own syntax:

![Summoning Ritual](figures/summoning-ritual.jpg)

or create an R code chunk with a call to knitr::include_graphics:

```{r summoning-ritual, echo=FALSE, fig.cap="Summoning Ritual"}
knitr::include_graphics("figures/summoning-ritual.jpg")
```

The options for the latter give the code chunk an ID, prevent it from being echoed in the final document, and most importantly, give the figure a caption. While it requires a bit of extra typing, it produces more predictable results across different output formats.

4.4 How can I format tables in a page?

Tables are the undemonstrative yet reliable foundation on which data science is built. While they are not as showy as their graphical counterparts, they permit closer scrutiny, and are accessible both to people with visual challenges and to the machines whose inevitable triumph over us shall usher in an agorithmic age free of superstition and mercy.

The simplest way to format tables is to use knitr::kable:

earthquakes %>%
  head(5) %>%
  kable()
Time Latitude Longitude Depth_Km Magnitude
2016-08-24 03:36:32 42.6983 13.2335 8.1 6.0
2016-08-24 03:37:26 42.7123 13.2533 9.0 4.5
2016-08-24 03:40:46 42.7647 13.1723 9.7 3.8
2016-08-24 03:41:38 42.7803 13.1683 9.7 3.9
2016-08-24 03:42:07 42.7798 13.1575 9.7 3.6

Our output is more attractive if we install and load the kableExtra package and use it to style the table. We must call its functions after we call kable(), just as we call the styling functions for plots after ggplot(). Below, we select four columns from our earthquake data and format them as a narrow table with two decimal places for latitude and longitude, one for magnitude and depth, and some multi-column headers:

earthquakes %>%
  select(lat = Latitude, long = Longitude,
         mag = Magnitude, depth = Depth_Km) %>%
  head(5) %>%
  kable(digits = c(2, 2, 1, 1)) %>%
  kable_styling(full_width = FALSE) %>%
  add_header_above(c('Location' = 2, 'Details' = 2))
Location
Details
lat long mag depth
42.70 13.23 6.0 8.1
42.71 13.25 4.5 9.0
42.76 13.17 3.8 9.7
42.78 13.17 3.9 9.7
42.78 13.16 3.6 9.7

4.5 How can I share code between pages?

If you are working on several related reports, you may want to share some code between them. The best way to do this with R Markdown is to put that code in a separate .R file and then load that at the start of each document using the source function. For example, all of the chapters in this book begin with:

```{r setup, include=FALSE}
source('common.R')
```

The chunk is named setup, and neither it nor its output are displayed. All the chunk does is load and run common.R, which contains the following lines:

library(tidyverse)
library(reticulate)
library(rlang)
library(knitr)

knitr::opts_knit$set(width = 69)

The first few load libraries that various chapters depend on; the last one tells knitr to set the line width option to 69 characters.

4.6 How can I parameterize documents?

knitr has many other options besides line width, and the tools built on top of it, like Blogdown and Bookdown, have many (many) more. Rather than calling a function to set them, you can and should add a header to each document. If we use File...New File...RMarkdown to create a new R Markdown file, its header looks like this:

---
title: "fourth"
author: "Dhavide Aruliah"
date: "18/09/2019"
output: html_document
---
  1. The header starts with exactly three dashes on a line of their own and ends the same way. A common mistake is to forget the closing dashes; another is to use too many or too few, or to include whitespace in the line.
  2. The content of the header is formatted using YAML, which stands for “Yet Another Markup Language”. In its simplest form it contains key-value pairs: the keys are words, the values can be numbers, quoted strings, or a variety of other things, and the two are separated by a comma.

This header tells knitr what the document’s title is, who its author is, when it was created (which really ought to be written as an ISO-formatted date, but worse sins await us), and what output format we want by default. When we knit the document, knitr reads the header but does not include it in the output. Instead, its values control knitr’s operation (e.g., select HTML as the output format) or are inserted into the document itself (e.g., the title).

Let’s edit the YAML header so that it looks like this:

---
title: "fourth"
author: "Dhavide Aruliah"
date: "2019-09-18"
output:
  html_document:
    theme: united
    toc: true
---
  1. The date is now in an unambiguous, sortable format. This doesn’t impact our document, but makes us feel better.
  2. We have added two sub-keys under html_document (which we have made a sub-key of output so that we can nest things beneath it). The first tells knitr to use the united theme, which gives us a different set of fonts and margins. The second tells it to create a table of contents at the start of the document with links to all of the section headers.

YAML can be quite complicated to understand. Luckily, a package called ymlthis is being developed to create and check files’ headers. Its documentation and capabilities are both steadily growing, it’s a great way to experiment with new or obscure options.

But YAML can do more than control the way knitr processes the document: we can also use it to create parameterized reports.

---
title: "Fifth Report"
params:
  country: Canada
---

This report looks at defenstration rates in `r params$country`.

```{r load-data}
data <- read_csv(here::here('data', glue(params$country, '.csv')))
```

This document’s YAML header contains the key params, under which is a sub-key for each parameter we want to create. When the document is knit, these parameters are put in a named list called params and can be referred to like any other variable. If we want to display it inline, we use a back-ticked code fragment that starts with the letter ‘r’; if we want to use it in a fenced code block, it’s no different from any other variable.

Parameters don’t have to be single values: they can, for example, be lists of mysterious ailments whose inexorable spread you are vainly trying to halt. Parameters can also be provided on the command line:

Rscript -e "rmarkdown::render('fifth.Rmd', params=list(country='Lesotho'))"

will create a page called fifth.html that reports defenestration rates in Lesotho.

4.7 How can I publish pages on GitHub?

Many programmers use GitHub Pages to publish websites for project documentation, personal blogs, and desperate entreaties to other-worldly forces (Iä! Iä! Git rebase fhtagn!). In its original incarnation, GitHub Pages worked used as follows:

  1. Authors created Markdown and HTML files with content they want to publish. They also created a configuration file called _config.yml with settings for their site (such as its title).
  2. All of this was put on a Git branch called gh-pages. Whenever that branch was updated on GitHub, a static site generator called Jekyll would find and transform those files using templates and inclusions taken from the _layouts and _includes directories respectively.

Keeping the gh-pages branch synchronized with other work proved to be a minor headache, so GitHub now offers two other options (which can be configured from a project’s settings in the GitHub browser interface):

  1. Publish directly from the root directory of the project’s master branch.
  2. Publish from the docs folder in the master branch.

One other piece of background is how Jekyll determines what to publish:

  • It ignores files and directories whose names start with an underscore, or that it is specifically told to exclude in _config.yml.
  • If an HTML or Markdown file starts with a YAML header, Jekyll translates it.
  • If the file does not include such a header, Jekyll simply copies it as-is.

All of this gives us a simple way to publish an R Markdown website:

  1. Make sure the project is configured to publish from the root directory of the master branch.
  2. Compile our R Markdown files locally to create HTML in the root directory of our local copy.
  3. Commit that HTML to the master branch.
  4. Push.

This works because the HTML files generated by knitr don’t contain YAML headers, so they are copied as-is. If we want to style those pages with our own CSS or add some JavaScript, we can tell knitr to include files of our choice during translation:

---
title: "Defenestration By Country"
output:
  html_document:
    includes:
      in_header: extra-header.html
      after_body: extra-footer.html
---
...content...

Behind the scenes, knitr translates our R Markdown into plain Markdown, which is then turned into HTML by yet another tool called Pandoc. If we are brave, we can create an entirely new Pandoc HTML template and format our files with that:

---
title: "Defenestration by Season"
output:
  html_document:
    template: seasonal-report.html
---
...content...

This works, but having the generated HTML in our root directory is messy. Given that we can configure GitHub Pages to publish from the docs folder, why don’t we put our HTML there? After all, knitr::knit has an output parameter with which we can specify a location for the output file.

The answer is that knitr becomes rather vexed when the output directory is not the same as the current working directory. Programmers being programmers, there are several ways around this:

  1. Put up with it.
  2. Write a small function that changes the current wording directory to docs, knits ../report.Rmd, then changes the working directory back to the project root.
  3. Use something like Make to build everything on the command line and then move all the generated files into docs.

None of this is made any less frustrating by the fact that other tools in the knitr family, such as Bookdown, do allow users to specify the output directory through a configuration parameter.

4.8 Key Points

  • FIXME: key points for R Markdown