donderdag 27 oktober 2011

Emulation is becoming a viable option (#KEEP) (1)

aa_DSC8958Just about everybody in our community has heard of emulation as an alternative preservation strategy to migration. But emulation is mostly talked about -- and in terms that are not very encouraging: emulation is rumoured to be very difficult to do, to require lots of expert knowledge and to be really expensive. On 26-27 October the road show of the European KEEP project came to the KB in The Hague and some 45 attendees were given the opportunity to learn more about emulation and test the tools developed by KEEP.

The resulting workshop (led by Jeffrey van der Hoeven, KB, at right) was very much worthwhile and enjoyable, with almost the entire KEEP project team sharing its knowledge. For all those interested in the deep technical details, I refer you to the KEEP website where all the slides will appear soon. For those of you who would like a broader overview, here are my experiences from the last few days (with a selection of slides from the presenters; thanks, KEEP team, for allowing me to use those).

aa_DSC9040

The audience included representatives from libraries, archives, a technical university, two The Hague-based criminal courts, a criminal information service, a film museum and a museum for media art.

What emulation is

In my general introduction to the workshop (what is the problem, etc.), I gave a definition for dummies that was not contested by the experts in the room, so I dare repeat it here. Our problem is that digital objects really only work on the hardware/software combination on which they were made. But these platforms change quickly. Migration is a strategy whereby you change the digital object to make it run on a newer platform. In the long term this is risky, because migrations are never 100%. There are always little changes. Until one day the object no longer is what it is supposed to be. Plus: it only works for a limited number of relatively simple file formats (text, images).

An alternative is emulation. You leave the digital object as it is, but you build software to make the new hardware/software combination pretend it is the old combination.

keep1

One can imagine this requires a lot of technical expertise about the old platform and the new platform and this for every possible file format around, of which there are thousands, plus all the dependencies within more complex objects such as websites.

keep2

Hence, it does not seem very practical. But when it works, you get a more authentic look and feel than with migration. Two participants in the project come from the computer game industry (Computer Spielemuseum and the European Games Developer Federation, EGDF), where look and feel is everything. Quite a few single emulator programmes have been developed already and new ones are developed all the time. But these are just pieces of the puzzle.

Emulation according to KEEP in four basic steps

Bram Lohman of Tessella summed up emulation according to KEEP in four basic steps, and I will let his slides speak for themselves:

keep6

keep7

EF_slide3

EF_slide4

 

The KEEP approach

The KEEP project wants to make emulation more practical by building a number of black boxes that take care of the most technical aspects of emulation; in this slide they are the green boxes.

keep3 

  • The Transfer Tools Framework is your assistant during the ingest phase. It helps you get content from different media (floppy disk, cd-rom, cassette, etc.) and store it into your digital archive.
  • At the access stage, there is the Emulation Framework, that analyses the digital object, finds the proper emulator software and delivers everything you need to run the object in a new computer environment.
  • The last building block is the KEEP Virtual Machine. This is the most daring goal of the project, and you can see in the slide that it is only pencilled in for now. Emulators themselves are pieces of software which become obsolete as hardware platforms change. The KEEP Virtual Machine is a piece of software that  is so basic and universal that it will ‘port’ into many hardware platforms and thus make emulator software much less prone to technology changes. This is the ‘portability’ part of the KEEP acronym.

A special component of the Emulation Framework is TOTEM, which helps users find technical information about the digital object to be emulated in various available databases such as PRONOM (see earlier post):

keep4

What emerged from the workshop

The attendees at the workshop were impressed by the work of the KEEP project. Jacob Taekema of the Amsterdam Municipal Archives (previously Rotterdam) concluded at the end of the two-day event that he could now see emulation developing into a really viable option. Aminata Kpewa and her colleagues of the Special Court of Sierra Leone were really happy to discover a network of people and a tool framework that could finally help them sort out the mess of obsolete databases in their archives.

The important accomplishment of the KEEP approach is that it takes care of a lot of the most technical, expert work (finding the best emulator, configuring the original environment), thus making emulation more accessible for less technically expert staff. The good news is also that the framework can be included into any existing preservation solution – Safety Deposit Box producer Tessella is one of the partners in the project. In addition, all of the work of KEEP is available in open source.

aa_DSC9252

Representatives of the KEEP team discussing archival problems with representatives of the Special Court for Sierra Leone – and making plans for on-site help.

But the KEEP team itself readily admits that much, much work still needs to be done:

  • keep5The framework now works for a limited number of platforms and file formats (slide at right), and, obviously, many more should be added to really make the framework practicable. As yet the framework cannot yet handle complex objects, although the emulation strategy is considered the most promising for those. It is a matter of finding the resources to put in lots of more R&D work.
  • At the ingest point (the tool transfer framework), the approach still depends on the availability of all sorts of old hardware to read obsolescent media carriers (floppy disks, tapes, etc.). Such devices are being collected by private individuals and sometimes by computer museums, but it was suggested that perhaps we need a more structured approach to safeguarding these essential machines and the knowledge about how to use them (nationally, perhaps, such as within the Dutch Digital Preservation Coalition?)
  • Whoever wants to work with an emulated object will have to know how that programme worked, what the commands were, etc. Thus we need to collect and keep as many manuals and specifications as possible. KEEP is building a knowledge base.
  • The KEEP project itself will end in February 2012. But much work still needs to be done. The KEEP team are doing everything they can to transfer the project to an organization such as the Open Planets Foundation (OPF), which can continue to develop the approach. Let us hope that this transfer is successful and more resources will become available to develop emulation as a strategy, e.g. from the European Community.

Digital preservation stumbling block: copyright law

A major stumbling block for KEEP project, for emulation, and, indeed, for all of digital preservation, is copyright legislation. David Anderson of the University of Portsmouth took upon himself the ungrateful task of finding out about possible legal constraints, and, unfortunately, he found (too) many. In fact, the deliverables of the KEEP project itself will be handed over to the European Commission in loose components because of copyright restrictions. Which is, when you come to think of it, pretty ridiculous. I will write a separate blog post about this important subject and David’s work soon.

More KEEP workshops planned

If this post has wetted your appetite for emulation, you are most welcome to attend one of the forthcoming KEEP national workshops: 10 November Zagreb, 29 November Rome, 24 January Cardiff. Details on the website.

aa_DSC9026

Alexander Fernandez on behalf of the European Games Developer Federation EGDF: ‘Involve game developers in preserving their work and building the emulation knowledge base.’

(For those who question the need to preserve games: they are part of our cultural heritage and they play a growing role in education. Plus: there is a keen and creative community out there developing those games, and they may well help projects such as KEEP to develop means to preserve other complex objects.)

PS: Not to complicate matters too much, and also because it was not mentioned during the workshop, I did not speak, in this blog post, of a third preservation strategy emerging from the CASPAR project, the ‘representation information network’; everything about that in an earlier post.

maandag 24 oktober 2011

Are we battling risks or adding value? (#DPS2011) (4)

Despite all our efforts, the digital preservation community is still having a hard time “selling” its activities to funders and to society at large. As Bohdana Stoklasova of the Czech National Library said at the Goportis Digital Preservation Summit: our work is unglamorous and undervalued.

_DSC8397 Bohdana Stoklasova making sure that the funders of the Czech National Library’s digital projects get their credit’s worth – as specifically required by funding deals.

Recently, the argument has been raised that we should stop describing our work in terms of ‘risk’ and ‘loss’ and ‘threats’, but rather speak in terms of ‘adding value’. During the panel discussion and coffee breaks at the Goportis Digital Preservation Summit, however, the consensus was that bad news sells better than good news. By suggesting that we organize a little data disaster I may have gone a little too far, but to my mind risk management is still the best approach to thinking about digital preservation.

_DSC8295 At the conference, Angela Dappert (previously British Library, now at the UK Digital Preservation Coalition) made the case for the risk management approach. In her view, risk management is central to all digital preservation activities. ‘Risk’, Angela began, ‘is uncertainty of outcome.’ It is something that might happen, and if it happens, it will become an issue’ (a problem). Risk management is about preventing risks from becoming issues. In that sense preservation is proactive whereas conservation is usually reactive (something has happened).

Below are some slides from Angela’s presentation (thanks for these, Angela); if you are interested in the subject be sure to check out the full presentation at http://www.digitalpreservationsummit.de/presentations.html.

20111019DappertGoportis_final

20111019DappertGoportis_final2

It is important to note that not all risks need to be addressed immediately. It is a matter of balancing the risks against the available means to combat them. In some cases one may wish to accept a risk (e.g., when the collection is not essential for the organization, or when it concerns masters of digitized collections for which the analogue original is still available):

20111019DappertGoportis_final8

20111019DappertGoportis_final5

Angela gave an overview of available tools for risk management, including the most well-known framework, Drambora, and a new kid on the block in which Angela is involved, the Timbus project.

20111019DappertGoportis_final6

20111019DappertGoportis_final9

_DSC880b1

Just try and manage the risk of losing cultural heritage in a city like Hamburg …

zaterdag 22 oktober 2011

An encyclopedia of file format information? (#DPS2011) (3)

Ingest being as complicated as it is (see previous post), let alone what comes after …, would it not be a terrific help if there was an encyclopedia where the best knowledge of the community was continually being assembled, structured and made accessible for all of us to work with? An encyclopedia telling us, e.g., what .pdf is, what hardware/software combination we need to render it, how the computer can recognize it, what versions and variations are known, what complications are known and how to deal with those, and, last but not least, what our colleagues’ experiences are with migrating or emulating this file format? Answers to the question: what is the best preservation strategy for this file format?

Would not that be heaven on digital preservation earth?

_DSC8458 Format registry ‘ecosystem’ workshop at Goportis DPS2011.

Well, in fact, the community has been working on such an encyclopedia (more commonly known as a ‘file format registry’) for quite a while now. PRONOM of the UK National Archives was considered a great start, but the upkeep of the system was just too much work for one institution. Then came initiatives such as the Global Digital Format Registry (GDFR) and the Unified Digital Format Registry (UDFR) – which is, as yet, in the planning stage.

The Goportis Digital Preservation Summit organized a workshop on this issue, led by Bill Roberts of the Dutch National Archives. In his introduction he listed the issues involved in setting up and maintaining such a registry: if the work is too much for one or a couple of organizations and we go for a decentralized approach (Wikipedia-style): Whom Do We Trust? How do we know that the information in the registry is reliable?

_DSC8447 Workshop leader Bill Roberts: ‘The most common preservation strategy I know of is “stick it in the archive for now”.’

The workshop discussions clearly demonstrated that there is a lot that is difficult to agree upon across the (broad) community. The question: ‘Do we even have an standard for a file format?’ prompted laughter from the workshop attendees.

But some undaunted souls can always be found to give a good idea a new chance. Bill Roberts reported on such an idea, arisen within the Open Planets Foundation. The idea is to set up a ‘Registry Ecosystem’. The objectives are:

_DSC8468

And the approach is:

_DSC8469

The workshop attendees supported the idea and agreed on a number of issues:

  • if possible find and mobilize the talent of ‘lonely geeks’
  • include as much prior knowledge as possible (although vendors such as Microsoft will be extremely frustrating partners to work with)
  • there should be common core data models
  • a test corpus is very much needed
  • the scope of the ecosystem should be broad and include such things as preservation policy procedures and software tools.

On the spot a highly qualified working group was set up to work on the registry ecosystem (including Bill Roberts, Leo Konstantelos of Glasgow, David Anderson and Janet Delve from Portsmouth, Adrian Brown of the Parliamentary Archives and Michelle Lindlar of Goportis; some other names were mentioned, but have yet to be approached). The working group is to meet virtually and deliver results within three months.

And, to quote Bill’s last slide: ‘The most important thing: Lots of people out there have pieces of the puzzle; we must encourage and enable them to share.’

PS: Leo Konstantelos suggested dropping the term ‘file format registry’, because the scope of what is described here is much larger. He suggested ‘technical registry’.

See also Bill Roberts´own blog post at the Open Planets Foundation website.

vrijdag 21 oktober 2011

On ingest, or: “receiving” is a complex word (#DPS2011) (2)

The second day of the Goportis Digital Preservation Summit was all about ingest, or 'receiving' content in your repository or archive. In his keynote, Seamus Ross, formerly of the digital preservation taskforce at Glasgow and now at the University of Toronto, was quick to dispell any illusions that 'receiving' is an easy thing to do.

_DSC8488

Interoperability troubles – from Open Office to Powerpoint – causing stress before Ross’s presentation (Nina Stoffers, left, Ross right).

Ross’s presentation was a complete Ingest 101 course, and so, I will let his slides tell most of the story.

Ingest is about “receiving” content from producers:

image

Ideally, we would want to create a work flow that is consistent, error-free, well-documented, in accordance with our organization’s policies:

image

Preferably, you know who the producers of your content are and you start negotiating with them so that they deliver the best possible quality. However, keep in mind that whatever makes our lives easier, is most likely to make the producer’s lives more difficult. That is where the bargaining begins. Ideally, you get this:

image

But in practice, this is most likely what you get most of the time:

image

Seamus Ross: ‘Most of the work we do during ingest is about fixing all these errors, is about compensating for the communication failures between producers and archives.’ So, what do we do?

image

How do we do all this? Ross: “You are a craftsman. You must accept that your tools are blunt.” Present tools for identification and validation are far from perfect. They still require a lot of manual work and the people who work with them must be very knowledgeable. Also, “You may be sure that producers will deliver error-laden stuff, no matter how well you train them.”

Ross stressed that policies are an essential part of the equation:

image

But even policies cannot guarantee smooth sailing:

image

Having said that, Ross did have a list of useful reference material for the audience, including an instructive case study at http://artefactual.com/wiki/index.php. Check out his slides when the complete set comes available via the event website. He also mentioned the useful NDIIPP tools and services directory at http://www.digitalpreservation.gov/partners/resources/tools/index.html and the Cairo Tools Survey. But remember Ross’s warning that working with these tools requires quite a bit of prior knowledge.

_DSC8500Seamus Ross: “But do not worry too much – digital archaeology will play a role in the future.” 

During the Q&A Adam Farquhar of the British Library offered his more optimistic view of the state of digital preservation (see yesterday’s post). Ross’s reply: ‘But that concerns only a narrow range of object types.’ Databases, for instance, are still a very real problem to deal with.

_DSC8394Goportis co-organizer Yvonne Friese checking #DPS2011 tweets.

More good stuff from this densely packed conference in the next few days. About whether OAIS is still helpful, about tools, about file format registries. And about thinking before you act, the New Zealand version.

_DSC8482

_DSC8378

woensdag 19 oktober 2011

Digital Preservation Summit (1): how far have we come?


We have gathered in cold, rainy, windy Hamburg today and tomorrow for the Goportis Digital Preservation Summit (#DPS2011). So the sunny note on which Adam Farquhar of the British Library started off the conference was quite welcome - except that some of us (including the undersigned) saw a few more clouds in the sky than Adam. A matter of the cup being half full or half empty?

Adam Farquhar (left) with Angela Dappert (DPC)
Farquhar said our progress to date is 'pretty encouraging'. 'Digital preservation has become business as usual,' he said, 'for large memory institutions.' Now that I reread my notes, the addition about large memory institutions is probably crucial to Adam's argument, but what stuck in my mind, and also in the mind of Steve Knight from the National Library of New Zealand (both of us presented during the day), is that in our opinion, digital preservation is still quite a long way off from being business as usual for most of the stakeholders - including data producers, funders, and all but the very largest memory institutions.
 
Adam also said that we are doing digital preservation 'at a substantial scale', citing recent BL projects involving the migration of millions of objects - whereas in my community (the Netherlands Coalition for Digital Preservation) I hear much grumbling about the (lack of) scalability of the tools we have at our disposal at present.
 

Fortunately (I mean in terms of agreeing on issues), Adam also saw a number of challenges:
  • changes in digital materials (flash, social media with short urls)
  • content in context - when a publication is commented upon over the years, it changes
  • dynamic content - complex objects such as 3D interactive views of crystals; html5 (which incorporates javascript elements)
  • a lack of skills in memory institutions - which is getting worse because of the budget cuts.
And I agree with all those (and surmise that Steve Knight will too).

At the end of his keynote, Adam Farquhar said two things:
  • Do not wait until we know everything to get it right, but do whatever you can now
  • Within our community, we need to become more honest about what works and what does not. That is the only  path to true learning.
To which I can only say: hear! hear!

There is much more good stuff to report from this conference (including instructive disagreements between presenters), but this time live or even semi-live blogging is difficult because I am presenting and moderating myself - plus: this conference is very well organised and the audience does not get any (boring) time off for blogging. Also, it is a 9 to 6 programme, and your blogger needs time to eat and sleep. So, dear readers, I must ask for a little patience. But I assure you: ALL shall be revealed  ... and in a matter of days even, because then we have the KEEP workshop coming up, and iPRES ...

On a more practical note: powerpoint and laptops are wonderful inventions, but can somebody PLEASE come up with a solution whereby every presenter is visible to the audience?
(Thanks to Natalie Walters for allowing me to use this image)

zaterdag 15 oktober 2011

Is emulation something for you (2)

If you tried to register for the The Hague Keep workshop (26-27 October, see last post) but were told the workshop was sold out, you may want to try again. Twenty more seats have been made available.

zaterdag 8 oktober 2011

Is emulation something for you?

KEEP-Keeping-Emulation-Environments-PortableIn their efforts to come up with catchy acronyms, project managers sometimes think of  wonderfully sounding names that, however, tell you too little about what is really going on. The European KEEP project is a case in point to me: KEEP stands for ‘Keeping Emulation Environments Portable’. Perhaps it is just my slow brain and/or the often fuzzy official project language, but for some reason I kept getting visions of shopping bags and briefcases …

jeffrey2On the occasion of the KEEP road show, which is coming to The Hague on 26-27 October (and to Zagreb, 9-11 November, Rome: 29-30 November), I asked KB colleague and KEEP participant Jeffrey van der Hoeven (at left, emulating KEEP user satisfaction) to explain it to me. Here is my version of what he told me:

The most well-known method to deal with software and hardware obsolescence is migration: you change the bits and the bytes of a digital object to make them work on a new platform. However, migration turns out to be not at all as risk-free as we would hope. Plus: it does not work for complex objects such as video games, websites, etc.. An alternative is emulation: you do not change the bits, but write software to make a new computer function as if it were an (old) computer. This means writing emulators for every possible combination (which is a lot of expensive R&D work), but if it works, there are fewer risks involved than with migration.

However, working with emulators is not for dummies. It is technically challenging work for specialists. The KEEP project developed an ‘emulation framework’ that takes care of that. It automatically selects the right emulator and configures the software required to render the object. That sounds quite handy.

Now, what about the ‘portability’? Emulators themselves are pieces of software that become obsolete over time. Therefore, KEEP is developing a KEEP ‘virtual machine’ – that will allow for execution of any software on any platform at any time.

Does this sound too good to be true? Come to The Hague (or Zagreb or Rome) and find out for yourself. Be sure to bring some old obsolete floppies with you to test the systems hands-on.