File Formats

Most of the data that you'll encounter or produce will be rectangular in nature; that is, it will be organized into columns and rows. Rectangular data can be stored in many file formats with xlsx perhaps being the one you'll be most familiar with, which is an Excel file. To be fair, we could store this data within a Word document in a table. But if we want to be able to work programmatically with the data - if we want to be able to use a computer program to help interpret the data, perform calculations, make inferences, and build visualizations - the data needs to be in a file format that will allow for this. A Word document does not allow for this. An Excel file allows for this within Excel. But we can do better.

A common file format for storing rectangular data is a csv file, or comma separated value file. Each row in a csv file contains data for a single observation, and each piece of data - or each potential variable value - is separated by a comma. If we were to look at the palmerpenguin data from Lab 7 saved as a csv, it would look something like this

"species","island","bill_length_mm","bill_depth_mm","flipper_length_mm","body_mass_g","sex","year"
"Adelie","Torgersen",39.1,18.7,181,3750,"male",2007
"Adelie","Torgersen",39.5,17.4,186,3800,"female",2007
"Adelie","Torgersen",40.3,18,195,3250,"female",2007
"Adelie","Torgersen",NA,NA,NA,NA,NA,2007
"Adelie","Torgersen",36.7,19.3,193,3450,"female",2007
"Adelie","Torgersen",39.3,20.6,190,3650,"male",2007

csv & Open Science

csv files are particularly important when we think about transparency, reproducibility, and accessibility. csv files are plain text files, which means they can be read by virtually any operating system and any text editor. A csv file produced in 1975 can still be opened on the computer you're using right now.

Conversely, your prof probably has a mountain of early spreadsheet data stored in proprietary file formats that predate Excel somewhere on a floppy disk from about 15 years ago that they can no longer read because the file format is no longer supported.

So, we want to ensure that our data is saved as csv. If you are currently working in Excel or another a similar spreadsheet program like LibreOffice Calc, csv should be a save as... option.

As you go to save your data as a csv you’ll notice there are a few different CSV formats available through Excel such as:

  • CSV UTF-8 (Comma delimited)
  • CSV (Comma delimited)
  • CSV (Macintosh)
  • CSV (MS-DOS)

Each CSV format encodes characters from your data in a slightly different way. You don’t need to worry about the details of how these CSV formats encode characters. For our purposes we will be using the CSV UTF-8 (Comma delimited) format.

Screenshot of Saving Data in CSV UTF-8 Format
Screenshot of Saving Data in CSV UTF-8 Format

Now that your data is tidy and saved in the CSV UTF-8 format, you’re ready to upload to the Shiny App!