12.1 Load packages and import data

Load the tidyverse, skimr, naniar, knitr, ggmosaic, and janitor packages:

library(tidyverse)
library(skimr)
library(naniar)
library(knitr)
library(ggmosaic)
library(janitor)

We’ll also need a new package called epitools, so install that now if you haven’t done so.

library(epitools)
## 
## Attaching package: 'epitools'
## The following objects are masked from 'package:binom':
## 
##     binom.exact, binom.wilson

We’ll use two datasets described in the Whitlock & Schluter text:

  • the “cancer.csv” dataset (described in Example 9.2 in the text, page 238)
  • the “worm.csv” dataset (described in Example 9.4 in the text, page 246)
cancer <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/cancer.csv")
## Rows: 39876 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): aspirinTreatment, response
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
worm <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/worm.csv")
## Rows: 141 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): infection, fate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Take a look at the cancer dataset:

cancer %>%
 skim_without_charts()
(#tab:cont_datalook_cancer)Data summary
Name Piped data
Number of rows 39876
Number of columns 2
_______________________
Column type frequency:
character 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
aspirinTreatment 0 1 7 7 0 2 0
response 0 1 6 9 0 2 0

And the worm dataset:

worm %>%
 skim_without_charts()
(#tab:cont_datalook_worm)Data summary
Name Piped data
Number of rows 141
Number of columns 2
_______________________
Column type frequency:
character 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
infection 0 1 6 10 0 3 0
fate 0 1 5 9 0 2 0

Both datasets are formatted “tidy” format. For a refresher on this, review the Biology Procedures and Guidelines document chapter on Tidy data.