4.6 Example question / answer

Below is an example of how to answer a question. You haven’t yet learned some of the functions we use here, but follow along for now - it’s just an example!

There are almost always multiple coding approaches to get the right answer, some better than others. As long as your code and answer are accurate and make sense, you’ll get the marks!

Question 1. What are the minimum and maximum heights (variable name is “height_cm”) of students in the “students” dataset, which is available at this URL:

https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students2.csv

As we learned in the [fictitious] “importing and exploring data” tutorial, I can use the read_csv function from the readr package (loaded with tidyverse) to download and import the dataset. It creates a “tibble” object, which here I name “students”:

students <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv")

I also learned that it’s a good idea to get an overview of the dataset as a first step after importing data. To do this, use the skim_without_charts function from the skimr package. I need to load that package first:

library(skimr)

Now skim:

skim_without_charts(students)
(#tab:skim_data)Data summary
Name students
Number of rows 154
Number of columns 6
_______________________
Column type frequency:
character 3
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Dominant_hand 0 1 1 1 0 2 0
Dominant_foot 0 1 1 1 0 2 0
Dominant_eye 0 1 1 1 0 2 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
height_cm 0 1 171.97 10.03 150 165.00 171.48 180.0 210.8
head_circum_cm 0 1 56.04 6.41 21 55.52 57.00 58.5 63.0
Number_of_siblings 0 1 1.71 1.05 0 1.00 2.00 2.0 6.0

This shows we have four character and three numeric variables, with 154 rows (observations) and 7 columns (variables) total.

We can use the summary function to get some basic descriptive statistics, including the minimum and maximum of numeric variables. The summary function is part of the base R package, so no additional packages need to be loaded.

We also use the select function from the dplyr package (which is loaded with tidyverse) to select which variable in the students tibble we wish to summarize.

The use of the “%>%” syntax is described in a later tutorial.

summary.height <- students %>% 
  select(height_cm) %>%
  summary
summary.height
##    height_cm    
##  Min.   :150.0  
##  1st Qu.:165.0  
##  Median :171.5  
##  Mean   :172.0  
##  3rd Qu.:180.0  
##  Max.   :210.8

As shown in the output above, the minimum height was 150.0 cm and the maximum student height was 210.8 cm.

TIP: You’ll note that functions and package names above are highlighted in grey. When writing in markdown, it’s good practice to encompass function names and package names in single backticks, i.e.`tidyverse`. Backticks are typically located with the tilden (“~”) key on your keyboard.