I’m sitting in the couch te-reading The Pragmatic Programmer, and couldn’t but share this quote.

…we believe that the best format for storing knowledge persistently is plain text. With plain text, we give ourselves the ability to manipulate knowledge, both manually and programmatically, using virtually every tool at our disposal.

The problem with most binary formats is that the context necessary to understand the data is separate from the data itself. You are artificially divorcing the data from its meaning. The data might as well be encrypted; it is absolutely meaningless without the application logic to parse it.

[…]

Human-readable forms of data, and self-describing data, will outlive all other forms of data and the applications that created them. Period. As long as the data survives, you will have a chance to be able to use it—potentially long after the original application that wrote it is defunct.

The Pragmatic Programmer — Chapter 3, pages 74-75

The reasoning is solid: you can write an application to parse plain-text from 50 years ago (see RFC 561 which describes Email headers, as an example), and arguably you could do it without ever reading the specification. Something like that would be really hard to do with a binary file format.

Plain-text allows for easy interoperability: HTML, FTP and Email were the foundation of the web. XML and JSON made it possible to write programmatic interfaces without having to share application logic.

Some examples of application document formats migrated from binary files to plain-text include Microsoft Office, and Xcode, both project and Interface Builder files.

Using plain-text seems a good approach to keep in mind when writing applications that save documents, data or state to disk.


This article was written as an issue on my Blog repository on GitHub (see Issue #15)