Data summaries with “gtsummary” package

This tutorial introduces an alternative to the skimr package for getting overviews of datasets.

The skimr package and its skim_without_charts function can cause issues when knitting to PDF.

The gtsummary package appears to have fewer such issues.

If you haven’t already, install the gtsummary package by typing this in your console (do this only once):

install.packages("gtsummary")

Let’s load packages …

library(tidyverse)
library(palmerpenguins)
library(gtsummary)
library(knitr)

The key function in the gtsummary package is the tbl_summary function:

?tbl_summary

Take note of the default settings for the “statistic” argument… by default, the function will return the median and IQR for numeric variables, and the sample size and relative frequency (expressed as percentage) for categorical variables.

For details on this function, along with a tutorial, see this webpage.

We’ll get an overview of the data using the tbl_summary function.

penguins %>%
  tbl_summary()

Characteristic	N = 344¹
species
Adelie	152 (44%)
Chinstrap	68 (20%)
Gentoo	124 (36%)
island
Biscoe	168 (49%)
Dream	124 (36%)
Torgersen	52 (15%)
bill_length_mm	44.5 (39.2, 48.5)
Unknown	2
bill_depth_mm	17.30 (15.60, 18.70)
Unknown	2
flipper_length_mm	197 (190, 213)
Unknown	2
body_mass_g	4,050 (3,550, 4,750)
Unknown	2
sex
female	165 (50%)
male	168 (50%)
Unknown	11
year
2007	110 (32%)
2008	114 (33%)
2009	120 (35%)
¹ n (%); Median (Q1, Q3)

We could select just some numeric variables, and ask for the mean and standard deviation. Note the syntax for the “statistic” argument… we have to provide a “list”, as follows:

penguins %>%
  select(bill_length_mm, bill_depth_mm) %>%
  tbl_summary(statistic = list(all_continuous() ~ "{mean} ({sd})"))

Characteristic	N = 344¹
bill_length_mm	43.9 (5.5)
Unknown	2
bill_depth_mm	17.15 (1.97)
Unknown	2
¹ Mean (SD)

So, if you find youself running into issues with the skimr package, feel free to use the gtsummary package instead!