5 Tidy data

Last updated 2023-02-10

We’ve talked about file naming, directory structures, and documentation to ensure accessible, interpretable, and transparent data. Now it’s time to talk about organizing individual variables within a given file. When properly organized, data values can be effectively analyzed, summarized, and visualized. When not, they can be onerous to work with and risk misinterpretation.

In general, your data files should adhere to the principles of "tidy data". Tidy data is governed by the following 3 rules1:

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

It’s easy to veer from these rules, as it’s often easier to collect data using data collection tools that violate these rules. When this is the case, we need to know how to re-organize our data to make it "tidy".

  1. See: Wickham, H. & Grolemund, G. (2017). Tidy Data. In R for Data Science.↩︎