4.6 Example question / answer
Below is an example of how to answer a question. You haven’t yet learned some of the functions we use here, but follow along for now - it’s just an example!
There are almost always multiple coding approaches to get the right answer, some better than others. As long as your code and answer are accurate and make sense, you’ll get the marks!
Question 1. What are the minimum and maximum heights (variable name is “height_cm”) of students in the “students” dataset, which is available at this URL:
https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students2.csv
As we learned in the [fictitious] “importing and exploring data” tutorial, I can use the read_csv
function from the readr
package (loaded with tidyverse
) to download and import the dataset. It creates a “tibble” object, which here I name “students”:
students <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv")
I also learned that it’s a good idea to get an overview of the dataset as a first step after importing data. To do this, use the skim_without_charts
function from the skimr
package. I need to load that package first:
Now skim:
Name | students |
Number of rows | 154 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Dominant_hand | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Dominant_foot | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Dominant_eye | 0 | 1 | 1 | 1 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
height_cm | 0 | 1 | 171.97 | 10.03 | 150 | 165.00 | 171.48 | 180.0 | 210.8 |
head_circum_cm | 0 | 1 | 56.04 | 6.41 | 21 | 55.52 | 57.00 | 58.5 | 63.0 |
Number_of_siblings | 0 | 1 | 1.71 | 1.05 | 0 | 1.00 | 2.00 | 2.0 | 6.0 |
This shows we have four character and three numeric variables, with 154 rows (observations) and 7 columns (variables) total.
We can use the summary
function to get some basic descriptive statistics, including the minimum and maximum of numeric variables. The summary
function is part of the base R package, so no additional packages need to be loaded.
We also use the select
function from the dplyr
package (which is loaded with tidyverse
) to select which variable in the students
tibble we wish to summarize.
The use of the “%>%” syntax is described in a later tutorial.
## height_cm
## Min. :150.0
## 1st Qu.:165.0
## Median :171.5
## Mean :172.0
## 3rd Qu.:180.0
## Max. :210.8
As shown in the output above, the minimum height was 150.0 cm and the maximum student height was 210.8 cm.
TIP: You’ll note that functions and package names above are highlighted in grey. When writing in markdown, it’s good practice to encompass function names and package names in single backticks, i.e.`tidyverse`
. Backticks are typically located with the tilden (“~”) key on your keyboard.