2.2 What’s in a name
File names need to achieve two primary goals, they need to make sense to a human reading them and they need to be constructed in a way that allows a computer to parse or process them. That is, file names should be human interpretable and machine readable. How do we achieve this?
Human interpretable
To be human interpretable, a file name needs to be meaningful. To do this, it needs to convey some basic information to a person reading it. We do this by integrating metadata into the file name. The metadata elements we include are:
- Who created the file
- The date on which it was created
- The project to which it is connected
- The nature of the contents of the file
- If it’s been modified
- The type or format of the file
That is, we should be able to look at a file and tell, who created it, when it was created, what it is related to, what is inside of it, if it has been updated, and what application I should expect to be able to open it with. As we’ll see shortly, we don’t always include a date, and we don’t always include information about modifying a file.
All said, that’s a fair bit of information to hold in a file name!
Machine readable
What does it mean for a file name to be machine readable or machine interpretable? It means building our file names in such a way that we can easily organize them so that they can be sorted by an application and in a way that makes sense to us. It also means building our names according to set patterns, which can then be parsed along known delimiters. Lastly, it means building our names in such a way that if we move them from one computer to another, from one application to another, or from one operating system to another, the files remain interpretable in exactly the same way.
How do we do this? We avoid special characters and follow conventions.
Special characters
Special characters are all characters except:
- Any character that is a part of the English alphabet
- Numbers from
0 - 9
- Dashes
-
- Underscores
_
This means that a space " " is a special character, which means that your file names should not have spaces.
When operating in a multi-lingual or non-English environment, this can prove problematic, but it is an unfortunate legacy of the development of computer standards that has yet to be fully resolved.
Conventions
Convention has file naming proceed in the following order, with each element separated by an underscore _
, and words within an element joined with a dash -
. The file type is generally added with a period .
and is usually automatically generated when an application creates a file.
Element-1 | _Element-2 | _Element-3 | _Element-4 | _Element-5 | .Element-5 |
---|---|---|---|---|---|
Last-Name | _Date | _Project | _File-Contents | _Version | .File-type |
Dates
Dates should be written in the following format yyyymmdd
. They should contain no spaces, no dashes, no words, just 8 numbers. Months and Days that are from 1 - 9 should be led by a 0. For example, January 23, 2020, should be written 20200123. When written this way, your computer will always sort your files from the earliest date to the latest date.
Keeping track of dates is especially important for data because the date on which your data was collected has direct relevance. Dates are less important for things like figures because they are derived from previously dated data.
Versions
Version tracking is achieved in file naming by adding _Vn
where n is the version number. With each major change, we increase n by 1. So version 1 would read _V1
, and when updated, it would read _V2
.
Versions are very important for things like manuscripts and interpretations of data, such as figures and other visualizations, which we will continue to change and modify throughout a project. Data, however, while it has a collection date, should not be modified, and should not then be versioned.