9.1 Load packages and import data
Let’s load some familiar packages first:
We will also need a new package called infer
, so install that package using the procedure you previously learned, then load it:
Import Data
For this tutorial we’ll use the human gene length dataset that is used in Chapter 4 of the Whitlock & Schluter text.
The dataset is described in example 4.1 in the text.
Let’s import it:
genelengths <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/humangenelength.csv")
## Rows: 22385 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): gene, name, description
## dbl (1): size
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Get an overview of the data
We’ll use skim_without_charts
to get an overview:
Name | Piped data |
Number of rows | 22385 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
gene | 0 | 1.00 | 17 | 18 | 0 | 22385 | 0 |
name | 0 | 1.00 | 2 | 15 | 0 | 19906 | 0 |
description | 432 | 0.98 | 1 | 51 | 0 | 4183 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
size | 0 | 1 | 3511.46 | 2833.29 | 69 | 1684 | 2744 | 4511 | 109224 |
The “size” variable is the key one: it includes the gene lengths (number of nucleotides) for each of the 22385 genes in the dataset.