What’s Old is New Again

Portion of Plate b41215 from the DASCH collection of Halley's comet taken on April 21, 1910 from Arequipa, Peru with the 8-inch Bache Doublet, Voigtlander. Courtesy of the DASCH project.

by Paul Edmon, November 4, 2019

As Solomon said "there is nothing new under the sun", and so it is with archival technology.  Hollywood it seems has stumbled on the trick of etching glass to store data for long term (i.e. centuries).  Fun thing is this is a return to previous methods of storage.

A general rule about data, the more useable data is the higher probability it can be destroyed or corrupted.  As such archiving data for the future has become a bit of a headache for librarians, data scientists, and research computing staff.  After all digital data is by far the most useful type of data, at least in our modern epoch.  It can be read and manipulated by computers.  It can be shared across the globe easily, and has an almost universal language to it.  However with that fungibility, comes problems of durability and longevity.  Digital data can be corrupted by a bad disk, electrical short, or even cosmic rays.  Those producing the data usually do not have data provenance and archive in mind so the data is stored in non-ideal ways and is commonly lost.  Beyond that since the 2nd Law of Thermodynamics still applies you have to contend with the fact that everything eventually decays and dies.  The enemy of entropy is the final end for all meaningful data.

So given all this what formats have stood the test of time?  How can we ensure that data will outlive us and be useful?  Clearly just running spinning disks, tape, or solid state devices will be insufficient.  After all the materials that make up tapes decay over time, hard drives get corrupted, and even solid state devices suffer the death of entropy.  Not to mention the fact that as formats change the previous format may be unreadable except to the select few who have that legacy machine and format knowledge.  How much data has already be lost on cassette tapes, 8 tracks, and ancient diskettes (ancient being 40 years old)?  Beyond that what about all the crazy amounts of emails that people send that one day will go into the great digital trash bin, historians would love to have those and who knows how long any given company who hosts the email or email format will last for.

The answer is, of course, in our past.  Over the past several thousand years of history, humanity has inadvertently or purposefully managed to construct means to store information for millennia.  For instance think about the cuneiform tablets that encode the earliest written languages of man.  They have lasted for nigh on 5000 years and, assuming you know Sumerian, are readable.  As it turns out ceramic is pretty durable stuff so these tablets have survived the test of time.

Naturally though cuneiform tablets are highly useful long term storage devices but not very fungible.  Thus humanity developed paper, scrolls, and books.  These can last for thousands of years, as shown by the numerous ancient texts (ancient here being thousands of years old) that we still have access to.  They are also more fungible as one can easily write comments in the margins, correct errors made by copyists.  Heck if you can figure out how to build a machine to make text production efficient, you could even start a Reformation!  Sadly though their durability does not hold a candle to ceramic tablets.

Even closer to our own day, one can encode data on glass plates.  Not movies like Hollywood wishes to do but photographs of the stars.  These pictures can be very useful for enterprising young women who want to better understand how stars work.  Even better they hold up great over time, lasting hundreds of years (at least that's as long of a baseline as we have thus far).  Sadly inspecting plates is rough business, even for graduate students.  Thus projects like DASCH want to do the reverse of what Hollywood is doing and moving the massive collection of data on plates to a digital format for easy of use.  But again, now the data has become fungible and more liable to destruction.  For long term the plates are actually a better means of storage.

So what to do?  Do you print out all your correspondence on dot-matrix thermal paper?  Do you try to inscribe it all on a golden plate and send it into deep space?  Do you get yourself a kiln, grab a stylus, and go old school?  Frankly nobody really knows but it is one of the major concerns with modern data: just how do you store all that information for posterity and yet keep it useful for modernity?  Perhaps the old maxim will turn out true once again, that perhaps the old ways can tell us something about how to live in the bright data rich future.