5.2 Tidy Data
In the previous example, our data were organized where day and quantity caught shared common columns. That is, not every variable had a dedicated column and consequently, not every variable had a value in every given cell - day did not have any cell values.
Tidy data breaks this down and reserves one column per variable and one row per observation. Remember, we have three variables: site, day, and quantity caught. So let’s transform this…
First, working with a collection tool where we have one table per day:
Site | Day | Trout_Caught |
---|---|---|
Mabel-lake | 1 | 1 |
Postill-lake | 1 | 3 |
Ellison-lake | 1 | 0 |
Site | Day | Trout_Caught |
---|---|---|
Mabel-lake | 2 | 3 |
Postill-lake | 2 | 4 |
Ellison-lake | 2 | 5 |
Site | Day | Trout_Caught |
---|---|---|
Mabel-lake | 3 | 3 |
Postill-lake | 3 | 5 |
Ellison-lake | 3 | 1 |
And second, gathering this data into a single dataset, sorted by site:
Site | Day | Trout_Caught |
---|---|---|
Mabel-lake | 1 | 1 |
Mabel-lake | 2 | 3 |
Mabel-lake | 3 | 3 |
Postill-lake | 1 | 3 |
Postill-lake | 2 | 4 |
Postill-lake | 3 | 5 |
Ellison-lake | 1 | 0 |
Ellison-lake | 2 | 5 |
Ellison-lake | 3 | 1 |
Now that’s tidy data!