4.6 Example question / answer
Below is an example of how to answer a question. You haven’t yet learned some of the functions we use here, but follow along for now - it’s just an example!
There are almost always multiple coding approaches to get the right answer, some better than others. As long as your code and answer are accurate and make sense, you’ll get the marks!
Question 1. What are the minimum and maximum heights (variable name is “height_cm”) of students in the “students” dataset, which is available at this URL:
https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students2.csv
As we learned in the [fictitious] “importing and exploring data” tutorial, I can use the read_csv function from the readr package (loaded with tidyverse) to download and import the dataset. It creates a “tibble” object, which here I name “students”:
students <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv")I also learned that it’s a good idea to get an overview of the dataset as a first step after importing data. To do this, use the skim_without_charts function from the skimr package. I need to load that package first:
Now skim:
| Name | students |
| Number of rows | 154 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| Dominant_hand | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| Dominant_foot | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
| Dominant_eye | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| height_cm | 0 | 1 | 171.97 | 10.03 | 150 | 165.00 | 171.48 | 180.0 | 210.8 |
| head_circum_cm | 0 | 1 | 56.04 | 6.41 | 21 | 55.52 | 57.00 | 58.5 | 63.0 |
| Number_of_siblings | 0 | 1 | 1.71 | 1.05 | 0 | 1.00 | 2.00 | 2.0 | 6.0 |
This shows we have four character and three numeric variables, with 154 rows (observations) and 7 columns (variables) total.
We can use the summary function to get some basic descriptive statistics, including the minimum and maximum of numeric variables. The summary function is part of the base R package, so no additional packages need to be loaded.
We also use the select function from the dplyr package (which is loaded with tidyverse) to select which variable in the students tibble we wish to summarize.
The use of the “%>%” syntax is described in a later tutorial.
## height_cm
## Min. :150.0
## 1st Qu.:165.0
## Median :171.5
## Mean :172.0
## 3rd Qu.:180.0
## Max. :210.8
As shown in the output above, the minimum height was 150.0 cm and the maximum student height was 210.8 cm.
TIP: You’ll note that functions and package names above are highlighted in grey. When writing in markdown, it’s good practice to encompass function names and package names in single backticks, i.e.`tidyverse`. Backticks are typically located with the tilden (“~”) key on your keyboard.