zaterdag 22 oktober 2011

An encyclopedia of file format information? (#DPS2011) (3)

Ingest being as complicated as it is (see previous post), let alone what comes after …, would it not be a terrific help if there was an encyclopedia where the best knowledge of the community was continually being assembled, structured and made accessible for all of us to work with? An encyclopedia telling us, e.g., what .pdf is, what hardware/software combination we need to render it, how the computer can recognize it, what versions and variations are known, what complications are known and how to deal with those, and, last but not least, what our colleagues’ experiences are with migrating or emulating this file format? Answers to the question: what is the best preservation strategy for this file format?

Would not that be heaven on digital preservation earth?

_DSC8458 Format registry ‘ecosystem’ workshop at Goportis DPS2011.

Well, in fact, the community has been working on such an encyclopedia (more commonly known as a ‘file format registry’) for quite a while now. PRONOM of the UK National Archives was considered a great start, but the upkeep of the system was just too much work for one institution. Then came initiatives such as the Global Digital Format Registry (GDFR) and the Unified Digital Format Registry (UDFR) – which is, as yet, in the planning stage.

The Goportis Digital Preservation Summit organized a workshop on this issue, led by Bill Roberts of the Dutch National Archives. In his introduction he listed the issues involved in setting up and maintaining such a registry: if the work is too much for one or a couple of organizations and we go for a decentralized approach (Wikipedia-style): Whom Do We Trust? How do we know that the information in the registry is reliable?

_DSC8447 Workshop leader Bill Roberts: ‘The most common preservation strategy I know of is “stick it in the archive for now”.’

The workshop discussions clearly demonstrated that there is a lot that is difficult to agree upon across the (broad) community. The question: ‘Do we even have an standard for a file format?’ prompted laughter from the workshop attendees.

But some undaunted souls can always be found to give a good idea a new chance. Bill Roberts reported on such an idea, arisen within the Open Planets Foundation. The idea is to set up a ‘Registry Ecosystem’. The objectives are:

_DSC8468

And the approach is:

_DSC8469

The workshop attendees supported the idea and agreed on a number of issues:

  • if possible find and mobilize the talent of ‘lonely geeks’
  • include as much prior knowledge as possible (although vendors such as Microsoft will be extremely frustrating partners to work with)
  • there should be common core data models
  • a test corpus is very much needed
  • the scope of the ecosystem should be broad and include such things as preservation policy procedures and software tools.

On the spot a highly qualified working group was set up to work on the registry ecosystem (including Bill Roberts, Leo Konstantelos of Glasgow, David Anderson and Janet Delve from Portsmouth, Adrian Brown of the Parliamentary Archives and Michelle Lindlar of Goportis; some other names were mentioned, but have yet to be approached). The working group is to meet virtually and deliver results within three months.

And, to quote Bill’s last slide: ‘The most important thing: Lots of people out there have pieces of the puzzle; we must encourage and enable them to share.’

PS: Leo Konstantelos suggested dropping the term ‘file format registry’, because the scope of what is described here is much larger. He suggested ‘technical registry’.

See also Bill Roberts´own blog post at the Open Planets Foundation website.

1 opmerking:

Yvonne zei

How cool, I should have gone there :-)