12.1 Load packages and import data

Load the tidyverse, skimr, naniar, knitr, ggmosaic, and janitor packages:

library(tidyverse)
library(skimr)
library(naniar)
library(knitr)
library(ggmosaic)
library(janitor)

We’ll also need a new package called epitools, so install that now if you haven’t done so.

library(epitools)

## 
## Attaching package: 'epitools'

## The following objects are masked from 'package:binom':
## 
##     binom.exact, binom.wilson

We’ll use two datasets described in the Whitlock & Schluter text:

the “cancer.csv” dataset (described in Example 9.2 in the text, page 238)
the “worm.csv” dataset (described in Example 9.4 in the text, page 246)

cancer <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/cancer.csv")

## Rows: 39876 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): aspirinTreatment, response
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

worm <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/worm.csv")

## Rows: 141 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): infection, fate
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Take a look at the cancer dataset:

cancer %>%
 skim_without_charts()

(#tab:cont_datalook_cancer)Data summary
Name	Piped data
Number of rows	39876
Number of columns	2
_______________________
Column type frequency:
character	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
aspirinTreatment	0	1	7	7	0	2	0
response	0	1	6	9	0	2	0

And the worm dataset:

worm %>%
 skim_without_charts()

(#tab:cont_datalook_worm)Data summary
Name	Piped data
Number of rows	141
Number of columns	2
_______________________
Column type frequency:
character	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
infection	0	1	6	10	0	3	0
fate	0	1	5	9	0	2	0

Both datasets are formatted “tidy” format. For a refresher on this, review the Biology Procedures and Guidelines document chapter on Tidy data.