5.2 Import a CSV file from a website

In previous tutorials we’ve already seen how to import CSV files from the web. As always, the first step is to load the tidyverse library, because it includes many packages and functions that are handy for both data import and data wrangling.

library(tidyverse)

One package that is loaded with the tidyverse package is readr, which includes the handy read_csv function.

You can view the help file for the read_csv function by typing this into your command console pane (bottom left) in RStudio:

?read_csv

You’ll see that the function has many optional “arguments”, but in general we can use the default values for these.

If the file you wish to import is located on the web, then we need to provide the “URL” (the web address) to the read_csv function. For example, in a previous tutorial you imported the “students.csv” dataset from the course GitHub website, as follows (and note that the URL address is provided in double quotation marks):

students <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv")
## Rows: 154 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Dominant_hand, Dominant_foot, Dominant_eye
## dbl (3): height_cm, head_circum_cm, Number_of_siblings
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

PAUSE: Did you get an error like this?

Error in read_csv(“https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv”) : could not find function “read_csv”

This happens if you forgot to load the tidyverse library!

The default behaviour of the read_csv function is to include a printout of the type of variable housed in each column of the dataset. For example, when importing the students dataset, the function gave a printout showing that there are three “character” variables (denoted chr) (R calls categorical variables character variables), and three dbl or double precision, floating point format numeric variables.

  1. Take a minute to check out this good overview of how R handles numeric variables.

You can tell R to not provide this information by including the argument show_col_types = FALSE within the read_csv code. Like so:

students <- read_csv("https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/students.csv",
                     show_col_types = FALSE)

We have now imported the data and stored it in a local object called “students”. The object is called a “tibble”, which you can think of as a special kind of spreadsheet. More information on “tibbles” can be found here.

Unless otherwise indicated, all CSV data files that we use in this course are stored at the same URL location, specifically: “https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/”. Thus, to import any CSV file you just need to copy that path, then append the appropriate file name to the end of the path. For example, the full path to access a CSV file called birds.csv file would be “https://raw.githubusercontent.com/ubco-biology/BIOL202/main/data/birds.csv”.

Often you’ll need to import data from a locally stored CSV file, rather than from the web. You’ll learn how to do this shortly. First: how does one create a CSV file?