Duurzame toegang (long-term access): Web archiving: the international arena (IIPC 2011)

dinsdag 10 mei 2011

Web archiving: the international arena (IIPC 2011)

Last month we reviewed the Dutch web archiving landscape (blog in Dutch), and yesterday the international web archiving community descended on The Hague for the IIPC (International Internet Preservation Consortium) annual General Assembly. So, Dutch fans, for once a blog in English, to report on the kick-off, an open conference entitled ‘Out of the Box: Building and Using Web Archives’, which was held at the KB in The Hague on 9 May. Despite cuts in travel budgets, some 100 professionals attended the General Assembly, from all over the world. The day was divided into three sessions: Collection treasures; Web archiving short stories; and The use of web archives. Let me share some highlights with you.

Collection treasures – and how to organize the international treasure hunt

Early in the day, Gildas Ilien of the Bibliothèque national de France (BnF) articulated some fundamental questions facing (national) web archiving institutions: ‘The Web challenges our national boundaries and policies. What’s in? What’s out? There is a need to define consistent selection and scoping criteria while projects and events develop. And: who should take care of what is in between nations, or everywhere? What is the risk? What is the value?’ Ilien referred specifically to event harvesting. The challenges there are emergency, scope and collaboration.

Ilien’s case was illustrated by the 2011 revolutions in Tunisia and Egypt which sparked a buzz of event archiving activity. In Egypt, the the internet was first shut down completely on 25 January, but the moment the ban was lifted, on 9 February, the Library of Alexandria started crawling the web for information, pictures and video to document the Egyptian revolution. Across the ocean, the Library of Congress realized something was going on which deserved capturing. Abbie Grotke of the LoC (standing in for Kris Carpenter of the Internet Archive) reported how the LoC used the informal IIPC network to call around and find out who was capturing what. This led to an impromptu collaborative effort between the Library of Congress, the Internet Archive, the BnF, the British Library, the American University in Cairo and Stanford University, to capture the events in the Middle East. They did not only archive websites, but blogs and social media as well. As Arthur Thomas would say later in the day: ‘It is more accurate to talk about collections of stuff from the Internet.’

As inspiring as this impromptu collaboration surrounding events in the Middle East was, the fundamental questions posed by Ilien are yet to be answered: who does what and how do we collaborate to capture the web that is fundamentally international and interinstitutional, and where, moreover, events develop at a much faster pace than our institutional decision-making procedures can keep up with (emergency!). We will no doubt continue the debate on these issues at the upcoming Aligning national approaches to digital preservation conference in Tallinn on 23-24 May. The slide to the right (by Carpenter, Grotke, Ilien) frames the important questions to be debated, including the many legal restrictions applicable to sharing data from web archiving activities – and the question on how we can organize central operators (such as the Internet Archive, maybe) and a possible World Watch.

The many faces of web archiving were demonstrated by two young archivists from the Czech National Library, Zuzanna Kratochvilova (left) and Lukas Gruber (right). After the turmoil from the Middle East, they reported that web archiving allows a national library not only to capture the traditional media, which specialize in bad news, but also many smaller websites that concentrate on good news, such as Czech citizens sharing their hobbies on the internet. ‘Content is more important for us than format’, Lukas told his audience, and so the team organized a ‘blog of the year’ contest to allow the public at large to help decide which blogs to include in the national collection. And Zuzanna could not help but smile when she conceded that memories from their own childhoods had prompted the team to select a site on a popular Czech children’s programme for archiving. It is part of the national cultural history.

Tjarda de Haan of the Amsterdam Museum recounted enthusiastically all the efforts that are going into recreating a milestone website, ‘The Digital City’ (De Digitale Stad), which was in the air between 1994-2001 and which was designed to help the general public learn to use the new possibilities of the internet. A huge cultural treasure, which tells the story of internet acceptation in the Netherlands, was lost when the commercial interests which owned the site by then simply pulled the plug in 2001. The Amsterdam Museum is now doing everything it can to find whatever content has survived, period hardware, and whatever they can find. Their efforts include a true Gravediggers’ Party, to be held on 13 May in Amsterdam.

To conclude the ‘treasures’ section, Gillian Lee of the National Library of New Zealand reported on a small-scale project to capture the Canterbury Earthquake on 22 February, and Mark Williamson of Hanzo, a commercial web archiving company, showed how Hanzo archives Coca Cola’s marketing history.

Web Archiving Short Stories – a parade of what we’ve got

In seven five-minute mini presentations, a wide array of web archiving projects was presented to the audience. A worthwhile show for anyone still doubting the need for web archiving. The web is where our lives are happening, the web is where our lives are being documented, the web is worth preserving. The parade included the Swiss, Croatian, Dutch, British and American national libraries and the Rotterdam Municipal Archive, and was concluded by a great film documenting an Internet Archive K12 project to train students and teachers about the need for and use of web archiving. Memorable quotes from a high school teenager: ‘We don’t usually realize that we’ll be history in like … two years.’ and: ‘Fifty years from now someone will read a textbook and it might be about us.’

Use of Web Archives – is what we’ve got good enough for researchers?

Having captured all of this wonderful stuff from the web, the question remains how these treasures are being used. More often than not, copyright restrictions keep national libraries from putting the material online so it can benefit the general public. Which means that all that work is being done to serve researchers. What do they think about these collections? The organizers had invited social scientists to debate the question, beginning with Ralph Schroeder, Arthur Thomas and Eric Meyer (picture at right) from the Oxford Internet Institute. They recently completed a draft of a report entitled Web Archives: the Future(s). One of their conclusions: present web archives are harvesting content, especially html/http, while that part of the web is becoming more insignificant with every passing day. Increasingly the web is dominated by the social platforms, location-based services and other very complex two-way and peer-to-peer mechanisms, semantic web/linked data, which are extremely complicated in a technical sense and thus are hardly being archived at all. Meyers: ‘Virtual worlds are our social life’, social scientists want to study how the networks function, who is interacting with whom about what, and that information is not being recorded.

So much for taking pride in our treasures ;-)

Anne Helmond and Esther Weltevrede, of the University of Amsterdam (in first picture of this blog above), demonstrated the type of research the Oxford team refers to. They analyzed the Dutch blogosphere 1999-2009. How did the network evolve? What platforms are being used by bloggers? (increasingly Dutch platforms!) Etcetera. Studies on the workings of the internet itself rather than the content it offers.

The Oxford team had a number of recommendations for web archives: try to archive server logs; try to archive traffic itself (an ambitious project!); allow researchers to trigger (frequency of) crawls; work on mechanisms to organize small and big events archiving (let machines trigger the harvesting, perhaps); think in terms of collections and collections of collections rather than individual sites. Quite an agenda! But Eric Meyer had more down-to-earth advise as well for institutions wishing to promote use of their web resources: ‘The best way to promote use of your digital resources is to provide examples of meaningful uses by others.’ and: ‘Institutions must put more effort into advertising what they have got.’ Gildas Ilien also recommended keeping in touch with the general public. ‘Not many people will use the web archive, but they will pay taxes for it if they know about it.’

Inevitably, the question of selection came up in the discussions. How do researchers feel about selection? The Oxford team: researchers increasingly say: collect everything, because we do not know what the future will need. The fact that we cannot analyze and make meaningful uses of this material presently, should not stop us from collecting. Arthur Thomas drew a comparison with his old field: biology. It all started with people collecting specimens, often they were amateurs. These collections were incomplete but they often generated great ideas, such as evolution and classification. Gildas Ilien of the BnF agrees with this philosophy: ‘We should collect as much as possible – do not wait until you lose it.’

‘Can we please everyone?’ asked Helen Hocks-Yu of the British Library retorically when describing a BL project to encourage use of the web archives and get feedback from researchers. As yet, the question remains unanswered. But help is on the way, as represented by Paul Girard of the largest French social sciences lab (Sciences Po). He described an open source software project ‘Hypertext Corpus Initiative’ in which a community of researches and web experts collaborates to develop mechanisms by which researchers will be enabled to sieve the massive amounts of data from the internet and find exactly what they need – without them having to have too many technical skills. In so doing they hope to deal with the quantity/quality dilemma. [NB: a corpus consists of web entities.]

The panel: ‘Document everything about your web archive’

The concluding panel (photo at the top of this blog), ably chaired by Martha Anderson of the Library of Congress, included the researchers mentioned above and two web archivists: Birgit Nordsmark Henriksen of the Danish National Library and Cathy Hartman of the University of North Texas. The Danish KB both carries out an annual .dk domain harvest (without quality checks) and more selective web archiving projects with better quality assurance – researchers specifically ask for experimental sites to be harvested. The University of North Texas started harvesting government websites when nobody else was doing it (that is vision for you_, and in so doing it gained an important position in the US web archiving community. Cathy explained that their web archive has now become an important part of the university’s identity: it serves the curriculum and attracts students.

In the discussions it transpired that what researchers need most from web archives is to know What is In the Box. How was the collection assembled? What was included and what not, for what reasons? Which harvests failed? Which breakdowns occurred at what times? Even curation activities and preservation actions should be carefully documented. All in the interest of sound research. Preferably, also, researchers should be enabled to use their own tools to analyse the data. And Paul Girard dreams of an index of all web archives in the world.

Dark Archive or Online Access?

As mentioned before, the contents of web archives are mostly restricted to no or only on-site access, because of many copyright and privacy regulations. The institutions are not happy with this situation, but they are reticent to break laws – no matter how old-fashioned they may be. But Martha Anderson had some encouraging news for the group. The Library of Congress has kept meticulous records of the use of its web archives over the past ten years, and now it seems the lawyers are beginning to agree that opening up the archives may not be such a dangerous thing to do. We shall hope to hear from the LoC soon.

And the Dutch KB had good news as well: at the end of the day it opened up its web archive for the first time. As not all technical and legal problems have as yet been sorted, access is limited to 700 websites which have passed (manual!) quality control. 2,300 websites will be added as they too pass quality checks. In about three years’ time, 10,000 Dutch websites should be harvested regularly. As for on-line access: the KB will strive to provide as much on-line access as possible in due course.

The IIPC General Assembly continues this week with a number of technical workshops – I hope that someone else, who is more knowledgeable when it comes to technical matters, will report on those.

(See also next blog.)

8 opmerkingen:

Sabine zei: Deze reactie is verwijderd door de auteur.; 11 mei 2011 om 16:56
Zuzana Kratochvílová zei: Deze reactie is verwijderd door de auteur.; 13 mei 2011 om 22:54
Zuzana Kratochvílová zei: Well written :-) Zuzana; 13 mei 2011 om 22:55
Unknown zei: Annapurna Round Trek Annapurna Round Trek commonly known as one of the most excellent trekking choices in fact it lets you witness the natural as well cultural variety of Nepal. Trekking in this area will let you to have a glimpse on the atypical combine of snowy peaks, tumultuous rivers, clear lakes, hot springs and attractive rural settlements occupied by welcoming local people. Annapurna Round Trek starts from Kathmandu to reach Besisahar and subsequently walking beside the Marsyangdi River all the way via rice and paddies pasture as well as passing by the rural settlements in the shade of the Annapurna and Manaslu ranges. As we go up, the vale turns to be narrow; cascades fall from high elevation and the backgrounds gains an uneven alpine form. The path after that turns west at the background the Annapurna range to leave into a Trans-Himalayan region occupied by Tibetan people. We will be amazed with the vistas of Annapurna II, III Gangapurna etc. Later we have a steady way up to the rustic community of Manang. We also go all the way through rough path traversing the Thorung La pass (5416 m). Additionally, during the walk you will get the chance to go in the region of Annapurna Conservation Area Project founded in 1968. The conservation region which we can glance while trekking in Annapurna region includes several of the richest lovely rhododendron forest on the earth and 100 varieties of orchids. Mainly, the upper alpine spot is dwelling areas for snow leopards, blue sheep etc. Likewise, other areas are the home for bird species like multi-coloured impeyan, kokla, blood pheasant, amongst an enormous numbers of varieties of birds, and also consist of butterflies as well as insects. With Nepal Planet Treks and Expedition you can go for a trip to Annapurna Round Trek which will let you to have a glimpse of shining rivers, terraced pastures of crops, rural inhabitation and amazing views of a number of snow-white mountains surrounded by huge glaciers that will make your trek one of the wonderful trekking of your life. - See more at: http://www.nepalplanettreks.com/annapurna-circuit-trek.html
http://www.nepalguideinfo.com
Email:sanjibtrekguide@gmail.com; 22 augustus 2015 om 12:25
Krishna Kumar Chalise zei: Hey!
thanks for the post. It will be useful for my next project to organize an international treasure hunt.
Keep posting these types of posts.; 30 mei 2019 om 10:11
MagicHimalaya zei: Nepal with different trekking & Hiking
Nepal is a beautiful country in Nepal. It is a landlocked country in Asia. There are many mountains ranges in Nepal. Many trekking area have different trekking options. Everest regions have many trekking. They are Everest base camp trek as well as Everest high pass trek. Everest view trek as well as renjo la pass trek is stunning trek. Gokyo Everest base camp trek as well as Gokyo valley trek has opportunities to see lake and rivers in high elevations. Everest base camp helicopter tour is one day tour to see Everest base camp. Everest base camp trek in helicopter is trek on the way up and return back on helicopter. We have Everest base camp luxury trek as well as Everest luxury trek for people who like luxury trek.
Everest base camp trek in January as well as Everest base camp trek in February is trek with stunning view. Everest base camp trek in March as well as Everest base camp trek in April are loved for clear day and blue sky. Everest base camp trek in May is chance to see climbing groups and tent in base camp. Everest base camp trek in august as well as Everest base camp trek in September are the beginning of another big season. It is stunning view as well as nice weather. Everest base camp trek in October as well as Everest base camp trek in November have many tourist for trek. Everest base camp trek in December is the end months for ebc trek.
Annapurna region is also the best trekking area. Annapurna base camp trek is also known as Annapurna sanctuary trek. Ghorepani poon hill as well as mardi himal trek are short trek in this region. Annapurna circuit as well as Annapurna base camp trek with ghorepani is long and wonderful trek. There are many nice treks in Langtang region. Langtang valley trek as well as tamang heritage trek is short trek. Langtang gosainkunda and helambu trek are one of the pilgrimage area trek. Gosinkunda pass trek is also a pilgrimage trek. Helambu trek as well as chisapani nagarkot trek is short and sweet trek.
There are lots of trek in restricted area of Nepal. Upper mustang trek as well as Manaslu is one of them. We have kanchanjunga base camp trek also to experience tenting trek. We also have lower Dolpa trek as well as upper Dolpa trekking for tenting. We have 12 days Everest trek as well as 13 days ebc trek in Everest. we have 14 days everest base camp trek as well as 15 days Everest base camp trek.
Besides that we have Base camp helicopter tour in Annapurna. Magic Himalaya Treks have lots of tour as well as trek. Visit Nepal and support us for Visit Nepal 2020.

http://www.magichimalayatreks.com/mt-everest-helicopter-charter-flight-tour-to-base-camp/
http://www.magichimalayatreks.com/everest-base-camp-trek/; 28 juni 2019 om 13:20
MagicHimalaya zei: Magic Himalaya is a leading tour and guide company based in Kathmandu, Nepal. The Magic Himalaya team has over 28 years combined trekking and expedition experience. In addition to offering day tours and other activities, Magic Himalaya is experts in trekking to Everest Base Camp, Annapurna and the Upper Mustang region. Magic Himalaya treks are the specialist for Private Day Trip in Kathmandu valley. Magic Himalaya treks also offers trekking, rafting, climbing, hiking, jungle safari, mountain flight, cycling, paragliding, bungy jumping and many other activities throughout Nepal,India, Bhutan, Tibet and many other countries.
We are specialist in Everest base camp trek as well as Everest base camp helicopter tour.we also operate Annapurna base camp trek, Annapurna circuit trek, Mardi Himal trek, Ghorepani poon hill trek, Mustang as well as manaslu trekking. We do yoga tours as well as kanchanjunga base camp trek. Gosainkunda pass trek as well as lower & Upper dolpa trekking. Remember us for all adventures in Nepal.
http://www.magichimalayatreks.com/annapurna-base-camp-trek/
http://www.magichimalayatreks.com/everest-base-camp-trek/
http://www.magichimalayatreks.com/mt-everest-helicopter-charter-flight-tour-to-base-camp/; 28 juni 2019 om 13:32
Anjali Kohli zei: Well written post on providing good information. By the way you can visit at 99CarRentals.com, to see our car rental services.; 3 juni 2021 om 06:51

Een reactie posten