David Giaretta with his book. |
Inevitably, things do get technical in the course of the book; after all, if we did not have technical problems we would not have a digital preservation problem, but the not-too-technical-reader is always warned in good time that this perhaps is a section or a chapter to be skipped. Yet the essence of Giaretta's theory is worth noting for everybody. In his view, migration and emulation, our most well-known preservation strategies, are perhaps good enough for simple objects (PDFs, tiffs, jpegs), but are inadequate for many complex objects which can be found among research data, Giaretta's main focus (hence the title Advanced Digital Preservation). No-one will doubt that scientific research often generates very difficult objects to preserve - they are complex, dynamic, often non-renderable, and so forth.
If you do not preserve research data, this book is still important for you, because other sectors (cultural heritage, archives) that started out with simple objects will increasingly be faced with more complex varieties, as content producers are discovering the extra possibilities and putting them to good use.
The 'Droste' effect |
To tackle the problems of more complex objects, Giaretta, and the CASPAR project team, developed a theory around the Representation Information Network. Simply put: a (or rather: any) data object is nothing but ones and zeros; they must be accompanied by representation information in the metadata to tell you what you need to 'independently interpret, understand and use' (in OAIS language) the data object. The data object can be a single file or multiple files, and the representation information can be anything from a scribbled handwritten note to a complex machine readable formal description (pp 17 ff). In Giaretta's more accessible advocacy language: you have something that is unfamiliar (ones and zeros) and the representation information gives you what you need to make it familiar. However, representation information is not a straight-forward thing: it is more like a set of Russian babushka dolls (in Dutch we would refer to the 'Droste effect', after the cacao nurse that serves from a cacao tin that has her own image on it which serves from a cacao tin that ...): a Word document cannot be understood with Microsoft Office software alone, you will need the operating system, and the programming language, and so forth and so forth. You will need every dictionary, every definition, every standard, every specification that is used somewhere along the line - until you connect with the knowledge base of your designated community, that is: you make the connection with what your designated community has at its disposal in terms of software, hardware and knowledge to work with those.
Giaretta and his CASPAR team argue that this is the only method that will work for all digital objects, no matter how simple or complicated. The trick will of course be to build that automated process that will keep our digital objects "fresh".
More research is needed to turn this theory into something practical. Meanwhile there is this book to enjoy and learn from, including excursions into non-technical territory: repository audits, preservation chains, business models, stakeholders analysis, and more. Giaretta's fluid style of writing, the many cross-references, summaries, and warning signs have enabled me to delve deeper into the technical level than I thought possible. And I am still learning.
What I would like to see next, however, is more interaction between what Giaretta is developing and what the Open Planets Foundation led by Bram van der Werf (and the related SCAPE project) is working on. What would be really great to have for the community is their joint views on what works and what does not - and in which circumstances, and the direction R&D should take. How about it, gentlemen?
David Giaretta [et al.], Advanced Digital Preservation (Springer, 2011, isbn 978-3-642-16808-6, €99.95).