17.1 Load packages and import data

Load the tidyverse, skimr, broom, knitr, and janitor packages:

library(tidyverse)
library(skimr)
library(broom)
library(knitr)
library(janitor)

We’ll use the “wolf.csv” and “trick.csv” datasets (discussed in examples 16.2 and 16.5 in the text, respectively).

wolf <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/wolf.csv")
## Rows: 24 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): inbreedCoef, nPups
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
trick <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/trick.csv")
## Rows: 21 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): years, impressivenessScore
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The wolf dataset includes inbreeding coefficients for wolf pairs, along with the number of the pairs’ pups surviving the first winter.

Explore the data:

wolf %>% skim_without_charts()
(#tab:corr_seedata1)Data summary
Name Piped data
Number of rows 24
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
inbreedCoef 0 1 0.23 0.10 0 0.19 0.24 0.30 0.4
nPups 0 1 3.96 1.88 1 3.00 3.00 5.25 8.0

We see that there are 24 observations for each of the two variables, and no missing values. If there WERE missing values, be sure to report the correct sample size in your results!

Now let’s explore the trick dataset:

trick %>% skim_without_charts()
(#tab:corr_seedata2)Data summary
Name Piped data
Number of rows 21
Number of columns 2
_______________________
Column type frequency:
numeric 2
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
years 0 1 27.29 15.21 2 17 28 39 50
impressivenessScore 0 1 3.43 1.36 1 2 4 4 5

It includes 21 observations, no missing values, and two integer variables: “years”, and “impressivenessScore”. Reading example 16.5 from the text, we see that the latter variable is a form of ranking variable.