18.5 Load packages and import data

Load the following packages, including palmerpenguins and kableExtra:

library(tidyverse)
library(knitr)
library(naniar)
library(skimr)
library(palmerpenguins)
library(kableExtra)

We’ll use the “circadian.csv” dataset, which is also used in the “comparing more than 2 means tutorial”. These data are associated with Example 15.1 in the text.

The circadian data describe melatonin production in 22 people randomly assigned to one of three light treatments.

circadian <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/circadian.csv")
## Rows: 22 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): treatment
## dbl (1): shift
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

18.5.1 Formatting output from the skimr package

It is always good practice to get an overview of a dataset before proceeding with analyses.

For this we have been using the skim_without_charts function from the skimr package.

You may have noticed that when you knit assignments to PDF, any output from the skim_without_charts function tends to run off the edge of the page. Here we’ll learn how to avoid this.

The key is to assign the output to an object first, as such:

skim.out <- circadian %>%
  skim_without_charts()

We will also “transpose” (rotate) the table using the t base function:, so that it’ll fit on a page:

skim.out.transposed <- t(skim.out)

Now we’ll use the kbl function with various arguments to ensure the output looks good and includes a table caption:

NOTE: This table will look different when knitted to PDF (rather than HTML)

IMPORTANT: Notice that, unlike figure captions, which must be provided in the chunk header, the caption for a table is provided as an argument to the kbl function within the actual code!

kbl(skim.out.transposed, caption = "Overview of the 'circadian' dataset",
    booktabs = TRUE) %>%
  kable_styling(latex_options = "hold_position")

In the preceding code we:

  • use the kbl function and provide 3 arguments:
    • provide the name of the table-like object we wish to format for output
    • provide a table caption
    • provide an argument “booktabs = TRUE” (this provides nice formatting)

Then we include pipes (“%>%), and follow with:

  • the kable_styling function with a few arguments:
    • “latex_options = ‘hold_position’” which forces the table to appear where the code chunk comes

For lots of great examples of how the kableExtra package can be used, see this vignette for PDF output, and this one for HTML output.

IMPORTANT: If you attempt to use the skim_without_charts function without wrapping the about in a kbl function, you will likely get an error when you attempt to knit to PDF.

Table 18.1: Overview of the ‘circadian’ dataset
1 2
skim_type character numeric
skim_variable treatment shift
n_missing 0 0
complete_rate 1 1
character.min 4 NA
character.max 7 NA
character.empty 0 NA
character.n_unique 3 NA
character.whitespace 0 NA
numeric.mean NA -0.7127273
numeric.sd NA 0.8901534
numeric.p0 NA -2.83
numeric.p25 NA -1.33
numeric.p50 NA -0.66
numeric.p75 NA -0.05
numeric.p100 NA 0.73

Let’s check to see if this approach works on a larger dataset (one with more variables).

Let’s try it on the “penguins” dataset:

skim.penguins <- penguins %>%
  skim_without_charts()

Transpose:

skim.penguins.transposed <- t(skim.penguins)

Now let’s try the kbl ouptut:

NOTE: This table will look different when knitted to PDF (rather than HTML). Specifically, it will print off page…

kbl(skim.penguins.transposed, caption = "Overview of the 'penguins' dataset",
    booktabs = TRUE) %>%
  kable_styling(latex_options = "hold_position")
Table 18.2: Overview of the ‘penguins’ dataset
1 2 3 4 5 6 7 8
skim_type factor factor factor numeric numeric numeric numeric numeric
skim_variable species island sex bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
n_missing 0 0 11 2 2 2 2 0
complete_rate 1.0000000 1.0000000 0.9680233 0.9941860 0.9941860 0.9941860 0.9941860 1.0000000
factor.ordered FALSE FALSE FALSE NA NA NA NA NA
factor.n_unique 3 3 2 NA NA NA NA NA
factor.top_counts Ade: 152, Gen: 124, Chi: 68 Bis: 168, Dre: 124, Tor: 52 mal: 168, fem: 165 NA NA NA NA NA
numeric.mean NA NA NA 43.92193 17.15117 200.91520 4201.75439 2008.02907
numeric.sd NA NA NA 5.4595837 1.9747932 14.0617137 801.9545357 0.8183559
numeric.p0 NA NA NA 32.1 13.1 172.0 2700.0 2007.0
numeric.p25 NA NA NA 39.225 15.600 190.000 3550.000 2007.000
numeric.p50 NA NA NA 44.45 17.30 197.00 4050.00 2008.00
numeric.p75 NA NA NA 48.5 18.7 213.0 4750.0 2009.0
numeric.p100 NA NA NA 59.6 21.5 231.0 6300.0 2009.0

Argh, our output went off the page!

If this happens, then we add another argument to the kable_styling function, the “scale_down” option:

kbl(skim.penguins.transposed, caption = "Overview of the 'penguins' dataset",
    booktabs = TRUE) %>%
  kable_styling(latex_options = c("scale_down", "hold_position"))
Table 18.3: Overview of the ‘penguins’ dataset
1 2 3 4 5 6 7 8
skim_type factor factor factor numeric numeric numeric numeric numeric
skim_variable species island sex bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
n_missing 0 0 11 2 2 2 2 0
complete_rate 1.0000000 1.0000000 0.9680233 0.9941860 0.9941860 0.9941860 0.9941860 1.0000000
factor.ordered FALSE FALSE FALSE NA NA NA NA NA
factor.n_unique 3 3 2 NA NA NA NA NA
factor.top_counts Ade: 152, Gen: 124, Chi: 68 Bis: 168, Dre: 124, Tor: 52 mal: 168, fem: 165 NA NA NA NA NA
numeric.mean NA NA NA 43.92193 17.15117 200.91520 4201.75439 2008.02907
numeric.sd NA NA NA 5.4595837 1.9747932 14.0617137 801.9545357 0.8183559
numeric.p0 NA NA NA 32.1 13.1 172.0 2700.0 2007.0
numeric.p25 NA NA NA 39.225 15.600 190.000 3550.000 2007.000
numeric.p50 NA NA NA 44.45 17.30 197.00 4050.00 2008.00
numeric.p75 NA NA NA 48.5 18.7 213.0 4750.0 2009.0
numeric.p100 NA NA NA 59.6 21.5 231.0 6300.0 2009.0

NOTE: On this HTML page the table above still goes off the page, but the PDF version will work!

Here’s an image of the PDF output:

For very large datasets you may find this approach causes the font to be too small.

18.5.2 A nicely formatted table of descriptive statistics

Here is the code (which you’ve already learned) to create a good table of descriptive statistics for a numeric response variable grouped by categories in a categorical explanatory variable.

We’ll use the “penguins” dataset again, and calculate descriptive statistics for bill lengths of male penguins, grouped by species:

penguins.stats <- penguins %>%
  filter(sex == "male") %>%
  group_by(species) %>%
  summarise(
    Count = n() - naniar::n_miss(bill_length_mm),
    Count_NA = naniar::n_miss(bill_length_mm), 
    Mean = mean(bill_length_mm, na.rm = TRUE),
    SD = sd(bill_length_mm, na.rm = TRUE),
    SEM = SD/sqrt(Count),
    Low_95_CL = t.test(bill_length_mm, conf.level = 0.95)$conf.int[1],
    Up_95_CL = t.test(bill_length_mm, conf.level = 0.95)$conf.int[2]
  )

Here’s what the raw table looks like:

penguins.stats
## # A tibble: 3 × 8
##   species   Count Count_NA  Mean    SD    SEM Low_95_CL Up_95_CL
##   <fct>     <int>    <int> <dbl> <dbl>  <dbl>     <dbl>    <dbl>
## 1 Adelie       73        0 40.39 2.277 0.2665     39.86    40.92
## 2 Chinstrap    34        0 51.09 1.565 0.2683     50.55    51.64
## 3 Gentoo       61        0 49.47 2.721 0.3483     48.78    50.17

Now let’s format it for PDF output:

kbl(penguins.stats, caption = "Descriptive statistics for bill length among male penguins.", 
    booktabs = TRUE, digits = c(0, 0, 0, 2, 3, 3, 3, 3)) %>% 
  kable_styling(latex_options = c("scale_down", "hold_position"), position = "center")
Table 18.4: Descriptive statistics for bill length among male penguins.
species Count Count_NA Mean SD SEM Low_95_CL Up_95_CL
Adelie 73 0 40.39 2.277 0.267 39.859 40.922
Chinstrap 34 0 51.09 1.565 0.268 50.548 51.640
Gentoo 61 0 49.47 2.721 0.348 48.777 50.171

The key difference from the previous example done with the skim_without_charts output is that here we specify the number of decimal places we want each descriptive statistic to be reported to.

Specifically, the “digits” argument accepts a vector of numbers, whose length is equal to the number of columns being reported in the table, and these numbers indicate the number of decimal places to include for that specific variable.

At this point it would be a good idea to revisit the Biology department’s guidelines for reporting descriptive statistics.

For example, we can see that the first three numbers in the “digits” argument are zeroes, and these correspond to the first three columns of the table: “species”, “Count”, “Count_NA”. These are columns whose values don’t require decimal places.

For the “Mean” column we report the values to 1 more decimal place than was used in the measurement (which you find out by looking at the raw data in the “penguins” object), so here, 2 decimal places.

For measures of spread (like the standard deviation) and measures of uncertainty (including SEM and confidence limits), report the numbers to 2 more decimal places than was used in the measurement, so here, 3 decimal places.

You now know how to produce nicely formatted tables in your knitted PDF output!

IMPORTANT: Be sure to try knitting to PDF as soon as you’ve used any of the kable or kableExtra package functions in a code chunk, as this will help you trouble-shoot if you encounter problems.