Duurzame toegang (long-term access): 2011

zaterdag 17 december 2011

This blog has moved / Deze blog is verplaatst

As of 15 December 2011, this weblog and its archive have been incorporated into the NCDD website, new url: http://www.ncdd.nl/en
or directly to the weblog, new url: http://www.ncdd.nl/blog/

Per 15 december 2011 is deze blog met archief en al verplaatst naar de NCDD website, nieuwe url: http://www.ncdd.nl
of direct near het weblog, nieuwe url: http://www.ncdd.nl/blog/

donderdag 8 december 2011

DISH2011 wrap up: the digital shift and us (DISH 4)

One can always trust Clifford Lynch of the Coalition for Networked Information to bring the issues home, and DISH2011 was no exception. ‘The digital shift is disrupting our organizations in fundamental ways,’ he said, finally addressing the question that the other four keynotes had left open: what does it all mean for us, for memory institutions.

Here are some of Clifford's observations:

‘In many ways, digital surrogates are more useful, more accessible and more robust than physical objects. That is deeply upsetting for people who have dedicated their entire lives to collecting and maintaining physical objects.’

‘Our new, online, audiences are very difficult to get hold of.’

‘We used to know where our collections began and ended. Our new users have no patience for those, often historical, accidents. To remedy this situation, organizations band together, but this raises new questions of ownership … of the reassembling.’

‘There are many more opportunities now for users to engage and to participate. Sometimes user impact is quite trivial, but it can also be very profound. For a lot of content, there is somebody out there who knows much more about it than we do and he is able to get in touch with us. Just think of the vast volumes of audiovisual content from our living memory. But user generated content does raise issues of trust: to what extent will we, memory organizations, be able or willing to vouch for this content?’ And there is more, ‘These participants may want to contribute more than just tags, they may bring us their own archives, expecting that there should be a place for the memories of all of us.’

Cliff Lynch is an acute observer of what goes on in our information world

Such a development will have a fundamental impact on our acquisitions policies. Many new choices will have to be made. We must talk about those choices, document them, share them with our peers, and thus develop a sense of what is happening. ‘I find that exciting and promising,’ Lynch concluded.

So the question becomes: are we adapting to this new environment? I attended a workshop session on 'national infrastructures' and heard Marco de Niet of the DEN Foundation say: 'We should have done this ten years ago.' He was commenting on Dutch plans to use the Europeana structure and tools to aggregate content from a variety of Dutch institutions on one discovery platform. They call it the 'Netherlands Cultural Heritage Collection' - but really, it is metadata only and, if we are lucky, we will get some thumb nails. A workshop attendee asked the critical question: "Will our users be satisfied with just metadata?" Joyce Ray of the US IMLS figured that no-one would be able to find the money to aggregate the content as well.

Marco de Niet (left) and Hans de Haan addressing the workshop
on national infrastructures. Time ran out before we were able to start
discussing the usefulness of national discovery platforms in an international information space.

But should such practicalities stop us from making bold moves?

In order to give us a sense that all of this is doable, the conference organizers had contracted strategist Michael Edson of the US Smithsonian Institution to give us a final pep talk the American way. His advice: stop thinking and talking in terms of ‘the future’. The pace of innovation is so quick now that we simply cannot spend months or even years talking about strategy. Because if we do, we will fail to recognize the things about digital culture that we can bank on now. In other words: ‘It is all a matter of going boldly into the present.’ Strategy should do work. It is a tool. (The text of his entire speech is on slideshare (edsonm).

Michael Edson

This is a spirit that can work - just look at what the Internet Archive has done with a shoestring budget of $10-15 million. But can it work for us, for you and me?

This is what Edson offered to take with us into the office this Monday morning:

1. What world am I living in?

2. What impact does my organization want to have in that world?

3. What should I do today?

I would say: good luck to all of us!

Grabbing digital preservation by the roots - #DISH2011, 3

Day 2 starts out with a plenary keynote by Samuel Jones about the importance of culture for our lives. Interesting and entertaining, no doubt about that, but like yesterday’s keynotes (see earlier post) the viewpoint is rather philosophical, and thus it is difficult to determine what heritage institutions can take home from it in terms of concrete advice as to how to deal with all this on a day-to-day-level.

So, after coffee I am heading for something completely different: a workshop by Karin van der Heiden, a freelance Dutch adviser on matters of digital archiving, especially for graphic designers.

Karin van der Heiden (right) with Job Meihuizen of Premsela.

In cooperation with Premsela, a Dutch archive for designers, she has recently published a brilliantly clear brochure entitled ‘Save as …’ giving graphic designers some really basic and helpful advice about how to organize their information and archives (see post in Dutch). A really good initiative, because we all know that choices well made at the production stage can really help keeping the stuff usable over longer periods of time. Focussing on production gets to the problem of digital preservation at the root. There is a Dutch-language version and an English-language version. The website is still only in Dutch, but I have been told that a US edition with website is forthcoming from AIGA, the US professional association for design.

On Twitter, Karin advertised her workshop as ‘digital archiving for dummies’. Perhaps there were a lot of archivists who thought 'that is not for me', because there were only a handful of attendees. But Karin's intention was not to train the archivists, but to train the archivists to train the producers - to train them to get away from complex terminology like OAIS and TRAC, and to enable them to explain to their producers what to do, at the level of explaining to your father what he should do if he wants to be able to look at his grandchild's pictures in 10 or 20 years.

If you think that this is perhaps too basic a level, just remember this: more and more digital content is being produced outside the sphere of influence of heritage institutions. Can you see the boxes of junk coming your way in 10 or 20 years' time and the troubles and expense they will cause? Educating everybody is therefore important to all of us. Karin's mission is to make basic preservation measures doable, enable designers, artists, researchers and everybody else to easily integrate basic measures into their workflow:

Great stuff. I'll let you know when the US edition becomes available.

Playing the 'digital lifecycle game' #DISH2011, 2

With no set body of knowledge or best practices, training our staff to deal with digital objects remains quite a challenge. The European DigCurV project promotes the availability of vocational training for digital curation. At DISH 2011 DigCurV's Kate Fernie and Katie McCadden presented a really cool training tool: the digital curation lifecycle game. Loosely based on a Monopoly board, the game presents players with real-life questions:

Rony Vissers (Packed, Belgium) searching for answers

'Half-way through your digitization project, the chosen file format is replaced by a new standard. What do you do?' and: 'You get funding to hire a manager for your preservation department. What skills will you require?' ‘You want to digitize your collection of sound recordings, but you do not have the necessary equipment, what do you do?’

Shawn Day, Digital Humanities, seems to feel the threat

These, of course, are no yes or no questions. They are intended to prompt discussion - and in our workshop session there were plenty of inspiring discussions.

The (physical) trial version we played still has a few flaws and Kate and Katie welcomed feedback to improve the game, but even as it was, the game had us working hard and having fun for over an hour. DigCurV intends to make the game available as a member’s bonus, so I would say: by all means, check out the website and join the DigCurV network! DEN Foundation and my own NCDD are members already, and Marco de Niet and I instantly decided that we must make a Dutch version.

How mature is your organization when it comes to digital preservation?

At the same DigCurV workshop Marco de Niet of DEN foundation drew attention to another tool that can be helpful in educating staff. It is a ‘maturity’ model developed by Charles M. Dollar, which was recently used as the basis for Dutch librarian Enno Meijers’ thesis Stapsgewijs naar duurzame toegang (on the NCDD website). The model is used to measure your own organization’s progress against a number of key criteria, and then decide where improvement is needed. Next month Charles Dollar will be speaking about his model at the PASIG Austin conference in Austin, TX, and I will write more about it then.

woensdag 7 december 2011

Are heritage institutions 'living the digital shift'? #DISH2011, 1

Today and tomorrow I am attending the DISH 2011 conference, or: Digital Strategies for Heritage, a biannual international conference organised by the DEN Foundation and The Netherlands Institute for Heritage (Erfgoed Nederland), two Dutch institutions with remits to promote (ICT) innovation in the cultural heritage sector. Paraphrasing first keynote speaker Katherine Watson of the European Cultural Foundation (ECF), the question apparently still must be asked: 'Are the arts, culture and heritage living the digital shift?'

The conference covers many angles of the digital shift, but obviously I will be on the lookout for sessions and papers dealing with long-term access. Having that focus makes it easier to make choices at this conference, which boasts three blocks of no fewer than fifteen (!) simultaneous parallel sessions - which means you always miss 14/15th of what's on offer. That's a lot to miss, and somebody tweeted: I hope the three plenary keynote presentations make up for the 'sacrifice'.

Did they? I can only give you my own answer: yes and no. Yes in the sense that they gave us a powerful picture of what present-day digital culture is all about. But for me, personally, the three keynotes pretty much covered the same ground and thus I would have been happier with just one keynote and more opportunity to attend a workshop. My preferred keynote would not have come from Katherine Watson or from Charles Leadbeater (although they made good points), but from Amber Case, a ‘cyborg anthropologist’ studying the tools we make for ourselves. These are 'no longer extensions of our physical selves, but extensions of our mental selves'. Case described the collections of digital photographs that we all have as a ‘Mary Poppins bag’ that is weightless – and because it is weightless and sheer unlimited, we do not really feel an acute sense of loss when it goes to waste .... Ah, so that's why it is so difficult to find funding for digital preservation! Yet, the stuff we put on Facebook and Youtube is as much a historical record of our lives as the murals in the Egyptian pyramids were. Only much more fragile and ‘suspended in mental space’.

Amber Case, 'living the digital age', despite her admittedly 'analogue' upbringing.
'In my own back yard, I understood the limits of my mental and physical capabilities.'

Other things we have to come to terms with are the digital age’s ‘simultaneous time’ – there is always somebody awake somewhere in the world and they may be filling your inbox to the brim. Physical proximity to people is losing relevance (‘everybody is always looking at their mobile phones and laptops’ - this conference is no exception) to ‘virtual proximity’ online. We have ‘second selves’ online which we must groom like our first selves. The games we play give us immediate rewards which are addictive. And we shed things and apparatuses like a tree sheds leaves – no more hand-me-downs from previous generations.

This world is fast, it is non-linear, it is mobile; users are actively engaging and have a million choices (Katherine Watson). And, what’s more, according to Amber Case they demand an information/cultural environment in which the interfaces and platforms and websites which we, heritage institutions, have so painstakingly built, disappear entirely into the background. Actions are reduced, queries are eliminated. 'The best technology is invisible; it gets out of the way and connects people.' Users want interfaces to make them feel 'superhuman', 'powerful'.

Charles Leadbeater: Users no longer want to be passive receivers, they want to 'search, enjoy, make, share, do'. The world is becoming 'asymmetric': small investments may have big impacts, and vice versa. Traditional roles and responsibilities no longer work. English football, with fixed roles for footballers (offense) and kicking pushing muscle (defense), has been transformed by (Dutch footballer) Johan Cruyff. At Barcelona FC, everybody must be able to play football.

Chair Chris Batt with a breakdown of the audience of
more than 300 attendees, 75% (my estimate) from the Netherlands

Pfffffhhh ... I turn my head and look at the many familiar faces in the audience. Colleagues from museums, from archives, from academic libraries. I happen to know about the average age of their staff. I happen to know about their budgets. I happen to know about their closets full of floppy disks and cd-roms. I happen to know about their ageing IT systems. And I think, wow, we've got a long way to go ...

BTW: my 1/15th of the afternoon programme, a DigCurV workshop about digital curation education, was well worth it. More about that tomorrow!

maandag 5 december 2011

Een "infrastructuur": wat is dat en hoe bouw je het?

Velen van jullie hebben de afgelopen jaren de missie van de NCDD op een van mijn powerpointdia's zien staan:

Dat is een hele mond vol, en dus zeg ik er altijd maar bij: wat wij daaronder verstaan is een landelijk dekkend netwerk van voorzieningen: mensen, kennis, opslagfaciliteiten, software, hardware, opleidingen, en, niet te vergeten, financiering.

Over een mond vol gesproken ...

Hoe vlieg je zoiets groots aan? De ene methode is de deltaplanmethode: grootschalig, hoog-boven-over. In polderland Nederland zie je zoiets maar zelden. Dan moet het water ons écht aan de lippen staan, zoals in 1953 letterlijk gebeurde. Voor de duurzame bruikbaarheid van onze digitale bestanden is zo een beweging (nog) niet tot stand gekomen. Wij (zeg maar, informatieprofessionals) weten wel van de tijdbom die onder onze digitale informatie tikt, maar die urgentie wordt nog lang niet door alle bestuurders als dringend ervaren.

Dus hebben we binnen de NCDD gekeken of we de uitdaging in min of meer behapbare brokken kunnen opdelen. Daar zijn vier werkpakketten uit gekomen:

Opslag (het betrouwbaar en vooral zo efficiënt mogelijk opslaan van de bits en de bytes, inclusief netwerkverbindingen)
Preservering (wat moeten we nu precies doen om die duurzaamheid te waarborgen - monitoren van de ontwikkelingen (preservation watch), plannen van preserveringsacties, de softwaretools die je daarvoor nodig hebt, R&D, en vooral veel kennis)
Afstemming collectiebeleid (digitale informatie laat zich lastig vangen in de traditionele taakverdeling tussen instellingen, daar moet je nieuwe afspraken over maken)
Kwaliteitszorg en certificering (wanneer is een archief een 'trustworthy digital repository'? Hoe bewijs je dat?)

Voor de eerste twee werkpakketten zijn sinds juni dit jaar NCDD-werkgroepen aan het werk. Zij hebben de opdracht gekregen om diverse scenario's te bedenken voor een landelijke infrastructuur. Die moet a) zo efficiënt (lees: goedkoop) mogelijk zijn, en b) de kwaliteit van duurzame toegankelijkheid in Nederland op een hoger plan brengen - met name ook voor de kleine instellingen die zelf geen digitaal depot kunnen betalen. Alle officiële informatie daarover staat op http://www.ncdd.nl/over-beleid.php.

De werkgroep Preservering aan het werk: rond de tafel Jata Haan (EYE), Giovanna Fossati (EYE, voorzitter), Frédérique Vijftigschild (NCDD support), Paul Doorenbosch (KB), Aad van der Valk (Beeld en Geluid), Mette van Essen (Nationaal Archief), Jeanine Tieleman (DEN) en Andrea Scharnhorst (DANS); ontbreken: Barbara Sierman (KB), Robert Gillesse (DEN) en Gaby Wijers (NIMk). Beelden van de werkgroep Opslag houden jullie van me tegoed.

Aan mij de eer om al dit werk te ondersteunen vanuit de NCDD, en ik kan je zeggen, de werkgroepen hebben het niet gemakkelijk. Duurzame toegankelijkheid is een jong vak met heel veel onzekerheden. Wie kan voorspellen wat voor computers we over 10 of 20 jaar zullen hebben? Wie durft te voorspellen hoe snel het web blijft groeien? Wie durft te selecteren wat we wel en niet moeten bewaren? Wie durft er vandaag definitief te zeggen wat de beste duurzaamheidsstrategie is? En hoe zit het met alle bestuurlijke en juridische complicaties?

Ga er maar aan staan. Niettemin zijn we vol goede moed aan het werk gegaan. Er wordt hersenkrakend nagedacht en geschreven. Ideeën worden geopperd en soms weer van tafel geveegd. Om soms later opnieuw op te duiken als andere alternatieven niet haalbaar zijn gebleken.

Maar we hebben hulp nodig. Van jullie. Daarom organiseren we:

NCDD symposium "Bouw een huis voor ons digitaal geheugen",
24 januari 2012, KB, Den Haag, 10.30 u tot 16.30 u, toegang gratis, wél even aanmelden

Programma en aanmelden op http://ncddsymposium.eventbrite.com.

woensdag 30 november 2011

Digital preservation basics in four online seminars

If you are new to digital preservation, you may want to check out four ‘webinars’ organized by the California State Library and the California Preservation Program. The one-hour webinars promise to give you a basic understanding of what digital preservation is all about, of interest especially to librarians and archivists who are involved in developing digital projects.

The first webinar is scheduled for December 8, 12 PM Pacific time (which is 21.00 hrs in Holland). Topics include: ‘storing digital objects, choosing and understanding risks in file formats, planning for migration and emulation, and the roles of metadata in digital preservation.’ See http://infopeople.org/training/digital-preservation-fundamentals.

woensdag 23 november 2011

‘Mind the Gap’ and Archive-it – on web archiving (iPRES2011, 9)

At a reception the other day, I heard a rumour. Because preserving web sites is so difficult, the Internet Archive was rumoured to consider printing all of its content. I will not disclose the informant’s name – he would not have a future in the digital library where he works (OK, it was a guy, a young guy, and he works for a Dutch library.) Needless to say, it could not even be done if the Internet Archive wanted to do it. Lori Donovan told the iPRES audience that a single snapshot of the www nowadays results in 3 billion pages [for the Dutch: 3 miljard pagina’s].

Mind-boggling numbers, especially if you think of the Internet Archive’s shoestring budget.

Anyway, iPRES2011 is over, but I still have some worthwhile stories waiting to be told. One of the issues tabled at iPRES was whether we can (and/or should) safely leave web archiving to the Internet Archive and national libraries.

Logistics put the Internet panel members much further apart than their viewpoints would warrant: they agreed that web archiving is important, not just for national libraries. From the left: Geoff Harder, University of Alberta, Tessa Fellon, University of Columbia, and Lori Donovan, the Internet Archive.

No, said Geoff Harder of the University of Alberta and Tessa Fellon of the University of Columbia. There are compelling reasons for research libraries to get involved as well. Harder: “This is just another tool in collection building; we should not treat it any differently. You begin with a collection policy and an expanded view of what constitutes a research collection: build on existing collections; find collections where research is happening or will happen.”

I would say that perhaps there are even more compelling reasons to collect web content than, e.g., printed books, because web content is extremely fleeting. Harder told his audience: “Too much online (western) Canadian is disappearing; this creates a research gap for future scholars and a hole in our collective memory.” He encouraged research libraries to: “Mind the Gap – Own the Problem”.

The University of Alberta’s involvement in web archiving started with a rescue operation: a non-profit foundation which created some 80+ websites, including the Alberta Online Encyclopedia, went out of business. This was extremely valuable content, and it needed to be rescued fast.

When a time bomb is ticking …

The University of Alberta decided to use Archive-It, a service developed by the Internet Archive. It is a light-weight tool that is easy to get up and running immediately. Plus, said Harder, there is a well-established tool-kit including dashboard and workflows, you become part of an instant community of users and your collection becomes part of a larger, global web archive. Because that is a precondition for working with Archive-It: by default, everything that is harvested becomes publicly available globally. Harder: ‘It is an economical tool for saving orphaned and at-risk web content … where we know a time bomb is ticking.”

Have a look at the collections built with Archive-It, I would say to research libraries’ subject specialists. You can include anything that is interesting in your field, such as important blogs, for as long as they are relevant.

Yunhyong Kim of HATII, Glasgow, takes blogging very serious and is doing research into the dynamics of the blogosphere.

Q&A

Is Archive-It durable enough? asked Yunhyong Kim of Glasgow (HATII). Donovan appeared confident that Internet Archive would be able to continue developing the tool. And I would repeat Harder: when a time bomb is ticking, you have got to go with what is available.

What about preventing redundancy, was another question. Should we not keep a register somewhere of what is being archived? Fellon thought that was a good idea, but perhaps it was too early for that. 'There are many different reasons for web archiving, different frequencies.” Sorting out what overlaps exactly and what does not is perhaps more work than just accepting some “collateral damage”.

If you want to know more about Archive-It, you can sign up for one of their live online demos. There’s one scheduled for November 29 and one for December 6. See the website

Archive-It Singapore-style

maandag 21 november 2011

PDF/A-2: what it is, what it can do, what it cannot do, and what to expect in the future

There is a new PDF ISO standard, 19005-2, or PDF/A-2, and therefore the Benelux PDF/A Competence Center decided to organize a seminar. When one of the organizers, Dominique Hermans of DO Consultancy, asked me to do the warming-up presentation, I readily agreed, because I had been hearing some bad things about PDF these last few months, and was eager to find out more. While preparing my own talk (slides at the end of this post) I decided to quote those very criticisms (see LIBER2011 blog post), just to get the ball rolling and challenge the experts to comment:

This slide of mine is a mash-up of three slides by Alma Swan at the LIBER 2011 conference, Open Access, repositories and H.G. Wells

These criticisms come from people who want machines to analyse large quantities of data in a semantic-web/Linked Data-type environment. Are the criticisms justified? For those of you who, like me, are sometimes confused about what is and what is not possible, I will summarize what the experts told the seminar.

The key one-liner came from Carsten Heinemann of LuraTech:

“PDF was designed as electronic paper”

‘It was designed to reproduce a visual image across different platforms (PC, Mac, operating systems), and for a limited period of time.’ As such, PDF was a really good product, because it was compact and complete and it allowed for random access. But there were also many issues, and Adobe has been working on fixing those ever since. This has resulted in an entire family of PDF formats with different functionalities.

PDF/A is the file format most suited for archiving purposes. The new standard, PDF/A-2 is not a new version of PDF/A-1 in the sense that one would need to migrate from 1 to 2, but rather a new member of the PDF family tree that has improved functionality over PDF/A-1. In order words: migrating from PDF/A-1 to PDF/A-2 is senseless, but if you are creating new PDF documents you may want to consider PDF/A-2 because of the new functionality to incorporate more features from the original document (e.g., JPEG2000 compression, possibility to embed one file into another, larger page sizes, support for transparency effects and layers).

To make matters more complicated, PDF-A/2 comes in two varieties. Compliance level 2a and compliance level 2b. Level a allows for more access by search engines such as used in semantic web techniques, because it requires that files do not only provide a visual image, but that they are structured and tagged and include Unicode character maps.

Heiermann concluded: XML is for transporting data; PDF is for transporting visual representations. To which I may add: XML is for use by machines, PDF is for use by humans.

Misuse of PDF is easy

Raph de Rooij of Logius (Ministry of the Interior) told his audience that one should not be too quick to say that something is “impossible” with PDF. A lot is possible, but you have to use the tools the right way – and that is where things often go wrong.

Raph demonstrated that most PDFs put online by government agencies do not meet the government’s own requirements for web usability – including access by those who are, e.g., visually impaired. “The many nuances of the PDF discussion often get lost in translation,” he said. The trick is to pay a lot of attention to organizing the work flow that ends in PDFs.

PDF is no silver bullet

Ingmar Koch, a well-known (blogging) Dutch public records inspector, has seen many examples of PDF misuse. “Public officials tend to think of PDF as a silver bullet that solves all of their archiving problems”. But PDF was never designed to include anything that is not static (excel sheets with formulas, movies, interactive communications, etc.).

From the left: Caroline vd Meulen, Ingmar Koch, Bas from Krimpen a/d IJssel and Robert Gillesse of the DEN Foundation.

From a preservation point of view, I heard some shocking case studies from public offices. An official will type the minutes of a council meeting in Word, make a print-out, have the print-out signed physically, then OCR the document and convert it to PDF for archiving. I dare not imagine how much information gets lost in the process. But then again, we all know that data producers’ interests are often different from archives’ interests. Public offices just want to make a “quick PDF” and not be bothered by all the nuances.

How about validation?

There is a lot of talk about “validating” PDF documents. First of all, PDFs are created by all sorts of software, and what they produce often does not conform to the ISO standards and is thus rejected by validators. Things get more confusing when validators turn out different verdicts. Heinemann explained: “That’s because some validators only check 30%, whereas some will check 80%. The latter may find something the first did not see.”

At the end of the day …

It seems that, indeed, there are millions and millions of PDFs out there that can only provide a visual representation and are no good when it comes to Linked Data and the Semantic Web. But PDF is catching up, including new features all of the time. I understand that we may even expect a PDF/A-3, which supports including the original file format in the digital object. Ingmar Koch did not seem to be too happy about such functionality. It would make his life as a public records inspector even harder. But from a preservation point of view, that just might be as close to a silver bullet for archiving as we will ever get.

Meanwhile, if you want to use PDF in your workflow, getting some advice from an expert about what type of PDF is appropriate in your case is called for!

Comments by Adobe

Adobe itself was very quick to respond to this blog post in an e-mail I found this morning. Leonard Rosenthol, PDF Architect, was not very pleased with the picture painted by the above workshop – as a matter of fact, he used the word “appalled”. He asserted that PDF and XML/Linked Data go very well together and that various countries and government agencies have already adopted a scenario that ‘presents a best of two worlds’. Here is his link to a recent blog post by James C. King that describes how it is done: <http://blogs.adobe.com/insidepdf/2011/10/my-pdf-hammer-revision.html.

That blog post is an interesting addition to the workshop results (confirming Raph de Rooy’s assertion that “nothing is impossible”), but it does not take away the fact that PDF is often misused. I would guess that is because it is complicated stuff. “Making a quick PDF” just does not do it. The recommendation to seek expert advice, therefore, stands!

Lastly, here is my own presentation: a broad overview of developments in the digital information arena to start off the day – in Dutch:

20111117 pdfa angevaare

View more presentations from ingeangevaare.

For the Dutch fans: Ingmar Koch has blogged about this event here, and the slides will become available here. Thanks also to KB colleague Wouter Kool for helping me understand PDF.

zondag 20 november 2011

‘Bewaar als …’: glashelder advies over digitaal archiveren

Karin van der Heiden heeft met Premsela (Nederlands Instituut voor Design en Mode) een glasheldere leporello (uitvouwbrochure) ontwikkeld om vormgevers praktische handvatten te bieden om hun informatie goed te ordenen en goed op te slaan – en dat is het begin van alle langetermijntoegang. Niet alleen belangrijk voor vormgevers, maar voor iedereen die digitale documenten maakt en die goed wil bewaren!

Gefeliciteerd, Karin, met deze productie!

Kijk op de bijbehorende website, http://bewaarals.nl/, en zegt het voort!

PS: Hieronder de hele vellen, in .jpeg.

An English edition will be made available in the US in a few months. I will keep you posted.

dinsdag 8 november 2011

Aligning with most of the world (iPRES2011, 8)

iPRES is organized alternately in Europe, in North-America and in Asia in order to include people and discussions from all continents – Africa and South-America are still on the Steering Committee’s wish list. However, when you looked at the list of presenters at iPRES2011, it was the usual suspects that dominated: Europe, North America, Australia/New Zealand. I asked a Programme Committee member about that, and he told me that some papers had been submitted from Asia, but they were deemed not good enough to make it to the programme.

To my mind, there is a bit of a contradiction in this. Of course we want high-quality papers at iPRES, but it is a bit risky to take our (western) stage of development as a yard stick for what constitutes “quality”. As Cal Lee phrased it: “Digital preservation tends to be quite regionally myopic.” I would suggest that the next iPRES organize a special track or workshop day for those that are just beginning to think about digital preservation, or that work from a very different context than a “western” one and focus on their specific circumstances and challenges.

Fortunately, there was one workshop that expressly invited members from “other” countries. It was the workshop “Aligning national approaches to digital preservation”, a follow-up from last May’s Tallinn conference (see my blog posts), put together by Cal Lee from the University of North Carolina. Yes, there were usual suspects presenting as well (including yours truly), but in this post I shall mostly ignore them in favour of new input:

Özgür Külcü, from Hacettepe University, Ankara, Turkey, described Turkish participation in the AccessIT project, whereby an online education module with practical information about digitisation issues and protection of cultural heritage was developed. And in the context of the InterPares 3 project the Turkish team is helping translate digital preservation theory into concrete action plans for organizations with limited resources. But many issues remain:

Masaki Shibata from Japan, revealed the results of a DRAMBORA 2.0 test audit carried out at the National Diet Library in Japan:

Shibata admitted that, unfortunately, the risks mentioned in the final report largely remain unsolved. ‘We were caught up in an illusion that there was an ideal solution to ensure long-term digital preservation,’ he said. ‘We tried to address the risks only by means of systems development.’ Also, specific Japanese and NDL circumstances played a role, such as the rigidness of the fiscal, budget, employment and personnel system; language difficulties and geographical constraints; lack of digital conservators; and a cultural context of preservation. Shibata concluded that an international alliance for digital preservation ‘would become a boost/tailwind for national policymaking in Japan.’

Daisy Selematsela from the National Research Foundation of the Republic of South Africa, described the outcomes of An audit of South African digitisation initiatives before focussing on “Managing Digital Collections: a collaborative initiative on the South African Framework”, a report published earlier this year, which is meant to provide data producers with high-level principles for managing data throughout the digital collection life cycle; and the Train-the-trainer programme:

As for international alignment, Selematsela concluded:

Raju Buddharaju of the National Library of Singapore (photo right) suggested that we first need a better understanding of what we mean “alignment” and what we mean by digital preservation (what do we include, what do we exclude) before we can try and come to workable initiatives.

The workshop was originally designed as a one-day event, but in the end the conference organizers only gave us 3 hours on Friday afternoon. The good news was that despite the time of day and conference fatigue, more than fourty participants showed up and they conducted animated discussions on such topics as: costs; public policy and society; and preservation & access.

But it was difficult to reach any concrete conclusions. There are many good intentions, but it continues to be difficult to find the common ground that leads to practical results. Steve Knight of the National Library of New Zealand (photo left) questioned whether there is any real will to collaborate, e.g., on putting together a much-needed international format (technical) registry. Talking about education, finally, Andi Rauber suggested that because there is no well-defined body of knowledge, we might prefer a range of “friendly competing curricula” rather than an aligned body – for the time being.

Which only goes to show that, like Singapore itself, alignment comes in many shapes and sizes.

Disaster planning and enabling smaller institutions (iPRES2011, 7)

As this iPRES was moved from Tsukuba, Japan, to Singapore because of the earthquake and tsunami in Japan in March this year, it was only fitting that iPRES2011 should include a panel session on disaster planning. Neil Grindley (JISC) asked if digital preservation does not implicitly include disaster planning, but Angela Dappart (DPC) argued that with an entire infrastructure going down, the problems will be massively larger. Plus, as Arif Shaon (STFC) observed, ‘Grade A preservation should include it, but we have not reached that stage yet.’

Shigeo Sugimoto of Tsukuba, who would have been iPRES’s host in Japan, took a forward-looking view at disaster planning. Many physical artefacts were lost during the earthquake, and having lots of digital copies at different locations can certainly help rescue cultural heritage, provided the metadata are kept at different locations as well.

Shigeo Sugimoto (right) with José Barateiro of Portugal during the disaster planning session.

There is one catch, though: many smaller institutions do not have the means (money, staff) to build digital archives. Therefore, in Japan the idea has been put forward to design a robust and easy-to-use cloud-based service for small institutions:

In the Netherlands, I am involved in two Dutch Digital Preservation Coalition (NCDD) working groups who are looking at the same problems: how to enable smaller institutions to preserve their digital objects. Professor Sugimoto and I have agreed to stay in touch and exchange information and experiences.

zondag 6 november 2011

‘At scale, storage is the dominant hardware cost’ (iPRES2011, 6)

It is not uncommon for conferences to be ‘interrupted’ by sponsor presentations. When I say ‘interrupted’, I do not necessarily mean that such talks are unwelcome. Conference days tend to be packed from early morning to late at night, and such sponsor interventions can be quite pleasant – a moment to doze off or to check your e-mail. Robert Sharpe (photo) of Tessella (vendors of the Safety Deposit Box or SDB system) gave us no such respite. In an entertaining presentation he shared some scalability experiences with us.

The case study was Family Search, which ingests no less than 20 Terabyte of images a day. That was quite a scalability test for the Tessella Safety Deposit Box system, and it tested some of Sharpe’s own assumptions:

Tessella expected that they would need faster, more efficient tools, but it turned out that existing tools (DROID, Jhove, etc.) were easily fast enough.
Tessella expected reading and writing of content to be fast compared to processing, but it turned out that reading and writing were not fast enough; the process required parallel reads and parallel writes. Thus the hardware cost is dominated by non-processing costs.
Tessella (and most of us) expected storage to be cheap, but at scale it turned out to be the dominant hardware cost. Reading and writing hardware came to about GBP 80,000. The storage costs came to GBP 100 per Terabyte content (3 copies), which amounted to GBP 730,000 a year, each year, and without refreshment costs.

Sharpe concluded that we do not need faster tools – but we do need better & more comprehensive tools. We need systems engineering, not just software engineering. And we need enterprise solutions: automation, multi-threading, efficient workflow management and automated issue handling.

All of which, of course, Rob will be happy to talk to you about.

PS: In response to this blog post, Rob wrote to me: ‘A further point I was trying to make in the rest of my talk is you don't need especially powerful application servers to do this: you can do it fairly cheaply (certainly when compared to other costs at such scale).’

Scale Singapore-style: the Marine Bay Sands Hotel. The ship-like contraption on top of the three towers holds lush tropical gardens, a 150 meter swimming pool, restaurants, and a bar.