Data summaries with “gtsummary” package
This tutorial introduces an alternative to the skimr package for getting overviews of datasets.
The skimr package and its skim_without_charts function can cause issues when knitting to PDF.
The gtsummary package appears to have fewer such issues.
If you haven’t already, install the gtsummary package by typing this in your console (do this only once):
install.packages("gtsummary")
Let’s load packages …
The key function in the gtsummary package is the tbl_summary function:
?tbl_summary
Take note of the default settings for the “statistic” argument… by default, the function will return the median and IQR for numeric variables, and the sample size and relative frequency (expressed as percentage) for categorical variables.
For details on this function, along with a tutorial, see this webpage.
We’ll get an overview of the data using the tbl_summary function.
| Characteristic | N = 3441 |
|---|---|
| species | |
| Adelie | 152 (44%) |
| Chinstrap | 68 (20%) |
| Gentoo | 124 (36%) |
| island | |
| Biscoe | 168 (49%) |
| Dream | 124 (36%) |
| Torgersen | 52 (15%) |
| bill_length_mm | 44.5 (39.2, 48.5) |
| Unknown | 2 |
| bill_depth_mm | 17.30 (15.60, 18.70) |
| Unknown | 2 |
| flipper_length_mm | 197 (190, 213) |
| Unknown | 2 |
| body_mass_g | 4,050 (3,550, 4,750) |
| Unknown | 2 |
| sex | |
| female | 165 (50%) |
| male | 168 (50%) |
| Unknown | 11 |
| year | |
| 2007 | 110 (32%) |
| 2008 | 114 (33%) |
| 2009 | 120 (35%) |
| 1 n (%); Median (Q1, Q3) | |
We could select just some numeric variables, and ask for the mean and standard deviation. Note the syntax for the “statistic” argument… we have to provide a “list”, as follows:
penguins %>%
select(bill_length_mm, bill_depth_mm) %>%
tbl_summary(statistic = list(all_continuous() ~ "{mean} ({sd})"))| Characteristic | N = 3441 |
|---|---|
| bill_length_mm | 43.9 (5.5) |
| Unknown | 2 |
| bill_depth_mm | 17.15 (1.97) |
| Unknown | 2 |
| 1 Mean (SD) | |
So, if you find youself running into issues with the skimr package, feel free to use the gtsummary package instead!