At a reception the other day, I heard a rumour. Because preserving web sites is so difficult, the Internet Archive was rumoured to consider printing all of its content. I will not disclose the informant’s name – he would not have a future in the digital library where he works (OK, it was a guy, a young guy, and he works for a Dutch library.) Needless to say, it could not even be done if the Internet Archive wanted to do it. Lori Donovan told the iPRES audience that a single snapshot of the www nowadays results in 3 billion pages [for the Dutch: 3 miljard pagina’s].
Mind-boggling numbers, especially if you think of the Internet Archive’s shoestring budget.
Anyway, iPRES2011 is over, but I still have some worthwhile stories waiting to be told. One of the issues tabled at iPRES was whether we can (and/or should) safely leave web archiving to the Internet Archive and national libraries.
Logistics put the Internet panel members much further apart than their viewpoints would warrant: they agreed that web archiving is important, not just for national libraries. From the left: Geoff Harder, University of Alberta, Tessa Fellon, University of Columbia, and Lori Donovan, the Internet Archive.
No, said Geoff Harder of the University of Alberta and Tessa Fellon of the University of Columbia. There are compelling reasons for research libraries to get involved as well. Harder: “This is just another tool in collection building; we should not treat it any differently. You begin with a collection policy and an expanded view of what constitutes a research collection: build on existing collections; find collections where research is happening or will happen.”
I would say that perhaps there are even more compelling reasons to collect web content than, e.g., printed books, because web content is extremely fleeting. Harder told his audience: “Too much online (western) Canadian is disappearing; this creates a research gap for future scholars and a hole in our collective memory.” He encouraged research libraries to: “Mind the Gap – Own the Problem”.
The University of Alberta’s involvement in web archiving started with a rescue operation: a non-profit foundation which created some 80+ websites, including the Alberta Online Encyclopedia, went out of business. This was extremely valuable content, and it needed to be rescued fast.
When a time bomb is ticking …
The University of Alberta decided to use Archive-It, a service developed by the Internet Archive. It is a light-weight tool that is easy to get up and running immediately. Plus, said Harder, there is a well-established tool-kit including dashboard and workflows, you become part of an instant community of users and your collection becomes part of a larger, global web archive. Because that is a precondition for working with Archive-It: by default, everything that is harvested becomes publicly available globally. Harder: ‘It is an economical tool for saving orphaned and at-risk web content … where we know a time bomb is ticking.”
Have a look at the collections built with Archive-It, I would say to research libraries’ subject specialists. You can include anything that is interesting in your field, such as important blogs, for as long as they are relevant.
Yunhyong Kim of HATII, Glasgow, takes blogging very serious and is doing research into the dynamics of the blogosphere.
Is Archive-It durable enough? asked Yunhyong Kim of Glasgow (HATII). Donovan appeared confident that Internet Archive would be able to continue developing the tool. And I would repeat Harder: when a time bomb is ticking, you have got to go with what is available.
What about preventing redundancy, was another question. Should we not keep a register somewhere of what is being archived? Fellon thought that was a good idea, but perhaps it was too early for that. 'There are many different reasons for web archiving, different frequencies.” Sorting out what overlaps exactly and what does not is perhaps more work than just accepting some “collateral damage”.
If you want to know more about Archive-It, you can sign up for one of their live online demos. There’s one scheduled for November 29 and one for December 6. See the website