5.2 Tidy Data

In the previous example, our data were organized where day and quantity caught shared common columns. That is, not every variable had a dedicated column and consequently, not every variable had a value in every given cell - day did not have any cell values.

Tidy data breaks this down and reserves one column per variable and one row per observation. Remember, we have three variables: site, day, and quantity caught. So let’s transform this…

First, working with a collection tool where we have one table per day:

Site Day Trout_Caught
Mabel-lake 1 1
Postill-lake 1 3
Ellison-lake 1 0
Site Day Trout_Caught
Mabel-lake 2 3
Postill-lake 2 4
Ellison-lake 2 5
Site Day Trout_Caught
Mabel-lake 3 3
Postill-lake 3 5
Ellison-lake 3 1

And second, gathering this data into a single dataset, sorted by site:

Site Day Trout_Caught
Mabel-lake 1 1
Mabel-lake 2 3
Mabel-lake 3 3
Postill-lake 1 3
Postill-lake 2 4
Postill-lake 3 5
Ellison-lake 1 0
Ellison-lake 2 5
Ellison-lake 3 1

Now that’s tidy data!