dinsdag 10 mei 2011

Web archiving: the international arena (IIPC 2011)

Last month we reviewed the Dutch web archiving landscape (blog in Dutch), and yesterday the international web archiving community descended on The Hague for the IIPC (International Internet Preservation Consortium) annual General Assembly. So, Dutch fans, for once a blog in English, to report on the kick-off, an open conference entitled ‘Out of the Box: Building and Using Web Archives’, which was held at the KB in The Hague on 9 May. Despite cuts in travel budgets, some 100 professionals attended the General Assembly, from all over the world. The day was divided into three sessions: Collection treasures; Web archiving short stories; and The use of web archives. Let me share some highlights with you.

Collection treasures – and how to organize the international treasure hunt

Early in the day, Gildas Ilien of the Bibliothèque national de France (BnF) articulated some fundamental questions facing (national) web archiving institutions: ‘The Web challenges our national boundaries and policies. What’s in? What’s out? There is a need to define consistent selection and scoping criteria while projects and events develop. And: who should take care of what is in between nations, or everywhere? What is the risk? What is the value?’ Ilien referred specifically to event harvesting. The challenges there are emergency, scope and collaboration.

Ilien’s case was illustrated by the 2011 revolutions in Tunisia and Egypt which sparked a buzz of event archiving activity. In Egypt, the the internet was first shut down completely on 25 January, but the moment the ban was lifted, on 9 February, the Library of Alexandria started crawling the web for information, pictures and video to document the Egyptian revolution. Across the ocean, the Library of Congress realized something was going on which deserved capturing. Abbie Grotke of the LoC (standing in for Kris Carpenter of the Internet Archive) reported how the LoC used the informal IIPC network to call around and find out who was capturing what. This led to an impromptu collaborative effort between the Library of Congress, the Internet Archive, the BnF, the British Library, the American University in Cairo and Stanford University, to capture the events in the Middle East. They did not only archive websites, but blogs and social media as well. As Arthur Thomas would say later in the day: ‘It is more accurate to talk about collections of stuff from the Internet.’

As inspiring as this impromptu collaboration surrounding events in the Middle East was, the fundamental questions posed by Ilien are yet to be answered: who does what and how do we collaborate to capture the web that is fundamentally international and interinstitutional, and where, moreover, events develop at a much faster pace than our institutional decision-making procedures can keep up with (emergency!). We will no doubt continue the debate on these issues at the upcoming Aligning national approaches to digital preservation conference in Tallinn on 23-24 May. The slide to the right (by Carpenter, Grotke, Ilien) frames the important questions to be debated, including the many legal restrictions applicable to sharing data from web archiving activities – and the question on how we can organize central operators (such as the Internet Archive, maybe) and a possible World Watch.

 

The many faces of web archiving were demonstrated by two young archivists from the Czech National Library, Zuzanna Kratochvilova (left) and Lukas Gruber (right). After the turmoil from the Middle East, they reported that web archiving allows a national library not only to capture the traditional media, which specialize in bad news, but also many smaller websites that concentrate on good news, such as Czech citizens sharing their hobbies on the internet. ‘Content is more important for us than format’, Lukas told his audience, and so the team organized a ‘blog of the year’ contest to allow the public at large to help decide which blogs to include in the national collection. And Zuzanna could not help but smile when she conceded that memories from their own childhoods had prompted the team to select a site on a popular Czech children’s programme for archiving. It is part of the national cultural history.

iipc7

Tjarda de Haan of the Amsterdam Museum recounted enthusiastically all the efforts that are going into recreating a milestone website, ‘The Digital City’ (De Digitale Stad), which was in the air between 1994-2001 and which was designed to help the general public learn to use the new possibilities of the internet. A huge cultural treasure, which tells the story of internet acceptation in the Netherlands, was lost when the commercial interests which owned the site by then simply pulled the plug in 2001. The Amsterdam Museum is now doing everything it can to find whatever content has survived, period hardware, and whatever they can find. Their efforts include a true Gravediggers’ Party, to be held on 13 May in Amsterdam.

To conclude the ‘treasures’ section, Gillian Lee of the National Library of New Zealand reported on a small-scale project to capture the Canterbury Earthquake on 22 February, and Mark Williamson of Hanzo, a commercial web archiving company, showed how Hanzo archives Coca Cola’s marketing history.

Web Archiving Short Stories – a parade of what we’ve got

In seven five-minute mini presentations, a wide array of web archiving projects was presented to the audience. A worthwhile show for anyone still doubting the need for web archiving. The web is where our lives are happening, the web is where our lives are being documented, the web is worth preserving. The parade included the Swiss, Croatian, Dutch, British and American national libraries and the Rotterdam Municipal Archive, and was concluded by a great film documenting an Internet Archive K12 project to train students and teachers about the need for and use of web archiving. Memorable quotes from a high school teenager: ‘We don’t usually realize that we’ll be history in like … two years.’ and: ‘Fifty years from now someone will read a textbook and it might be about us.’

Use of Web Archives – is what we’ve got good enough for researchers?

Having captured all of this wonderful stuff from the web, the question remains how these treasures are being used. More often than not, copyright restrictions keep national libraries from putting the material online so it can benefit the general public. Which means that all that work is being done to serve researchers. What do they think about these collections? The organizers had invited social scientists to debate the question, beginning with Ralph Schroeder, Arthur Thomas and Eric Meyer (picture at right) from the Oxford Internet Institute. They recently completed a draft of a report entitled Web Archives: the Future(s). One of their conclusions: present web archives are harvesting content, especially html/http, while that part of the web is becoming more insignificant with every passing day. Increasingly the web is dominated by the social platforms, location-based services and other very complex two-way and peer-to-peer mechanisms, semantic web/linked data, which are extremely complicated in a technical sense and thus are hardly being archived at all. Meyers: ‘Virtual worlds are our social life’, social scientists want to study how the networks function, who is interacting with whom about what, and that information is not being recorded.

iipc10So much for taking pride in our treasures ;-)

Anne Helmond and Esther Weltevrede, of the University of Amsterdam (in first picture of this blog above), demonstrated the type of research the Oxford team refers to. They analyzed the Dutch blogosphere 1999-2009. How did the network evolve? What platforms are being used by bloggers? (increasingly Dutch platforms!) Etcetera. Studies on the workings of the internet itself rather than the content it offers.

The Oxford team had a number of recommendations for web archives: try to archive server logs; try to archive traffic itself (an ambitious project!); allow researchers to trigger (frequency of) crawls; work on mechanisms to organize small and big events archiving (let machines trigger the harvesting, perhaps); think in terms of collections and collections of collections rather than individual sites. Quite an agenda! But Eric Meyer had more down-to-earth advise as well for institutions wishing to promote use of their web resources: ‘The best way to promote use of your digital resources is to provide examples of meaningful uses by others.’ and: ‘Institutions must put more effort into advertising what they have got.’ Gildas Ilien also recommended keeping in touch with the general public. ‘Not many people will use the web archive, but they will pay taxes for it if they know about it.’

Inevitably, the question of selection came up in the discussions. How do researchers feel about selection? The Oxford team: researchers increasingly say: collect everything, because we do not know what the future will need. The fact that we cannot analyze and make meaningful uses of this material presently, should not stop us from collecting. Arthur Thomas drew a comparison with his old field: biology. It all started with people collecting specimens, often they were amateurs. These collections were incomplete but they often generated great ideas, such as evolution and classification. Gildas Ilien of the BnF agrees with this philosophy: ‘We should collect as much as possible – do not wait until you lose it.’

‘Can we please everyone?’ asked Helen Hocks-Yu of the British Library retorically when describing a BL project to encourage use of the web archives and get feedback from researchers. As yet, the question remains unanswered. But help is on the way, as represented by Paul Girard of the largest French social sciences lab (Sciences Po). He described an open source software project ‘Hypertext Corpus Initiative’ in which a community of researches and web experts collaborates to develop mechanisms by which researchers will be enabled to sieve the massive amounts of data from the internet and find exactly what they need – without them having to have too many technical skills. In so doing they hope to deal with the quantity/quality dilemma. [NB: a corpus consists of web entities.]

The panel: ‘Document everything about your web archive’

The concluding panel (photo at the top of this blog), ably chaired by Martha Anderson of the Library of Congress, included the researchers mentioned above and two web archivists: Birgit Nordsmark Henriksen of the Danish National Library and Cathy Hartman of the University of North Texas. The Danish KB both carries out an annual .dk domain harvest (without quality checks) and more selective web archiving projects with better quality assurance – researchers specifically ask for experimental sites to be harvested. The University of North Texas started harvesting government websites when nobody else was doing it (that is vision for you_, and in so doing it gained an important position in the US web archiving community. Cathy explained that their web archive has now become an important part of the university’s identity: it serves the curriculum and attracts students.

In the discussions it transpired that what researchers need most from web archives is to know What is In the Box. How was the collection assembled? What was included and what not, for what reasons? Which harvests failed? Which breakdowns occurred at what times? Even curation activities and preservation actions should be carefully documented. All in the interest of sound research. Preferably, also, researchers should be enabled to use their own tools to analyse the data. And Paul Girard dreams of an index of all web archives in the world.

iipc14

Dark Archive or Online Access?

As mentioned before, the contents of web archives are mostly restricted to no or only on-site access, because of many copyright and privacy regulations. The institutions are not happy with this situation, but they are reticent to break laws – no matter how old-fashioned they may be. But Martha Anderson had some encouraging news for the group. The Library of Congress has kept meticulous records of the use of its web archives over the past ten years, and now it seems the lawyers are beginning to agree that opening up the archives may not be such a dangerous thing to do. We shall hope to hear from the LoC soon.

And the Dutch KB had good news as well: at the end of the day it opened up its web archive for the first time. As not all technical and legal problems have as yet been sorted, access is limited to 700 websites which have passed (manual!) quality control. 2,300 websites will be added as they too pass quality checks. In about three years’ time, 10,000 Dutch websites should be harvested regularly. As for on-line access: the KB will strive to provide as much on-line access as possible in due course.

The IIPC General Assembly continues this week with a number of technical workshops – I hope that someone else, who is more knowledgeable when it comes to technical matters, will report on those.

(See also next blog.)

iipc13

8 opmerkingen:

Sabine zei
Deze reactie is verwijderd door de auteur.
Sabine zei

Cool

Aedem zei
Deze reactie is verwijderd door de auteur.
Aedem zei

Well written :-) Zuzana

Lauraine zei

I have no words for this great post such a awe-some information i got gathered. Thanks to Author.
Vee Eee Technologies

Rishi Nepal zei

EncountersNepal.com
Trekking in Nepal - Step into our world. It’s a world of lush green valleys, fertile terraced fields laden with fresh cultivated crops, poring into quiet pine forests of junipers, across glittering glacial lakes... & all this in the backdrops of towering silver grey peaks that speak of a world brimming with Everest base camp Trek. pure adventure... it’s a world that begins with ENCOUNTERS NEPAL.COM... & it never ends there... because it’s a wonderful world!!!
Trekking in Nepal
Everest base camp trekking
Annapurna base camp trekking
Manaslu Circuit Trekking
Upper Mustang Trekking
Annapurna Circuit Trekking
Langtang Valley Trekking
Kanchanjunga Trekking
Upper dolpo trekking
Tsum Valley Trekking
Island Peak climbing
Ghorepani Poonhill Trekking
Mansarovar Kailash (Shiva Parbat) Yatra
Gangajamuna Ganesh Parbat Trek
Holiday Package in Nepal
Mt flight in Nepal
Expedition in Nepal
Trekking in Bhutan

Mt. Everest base camp Trek zei

Annapurna Round Trek Annapurna Round Trek commonly known as one of the most excellent trekking choices in fact it lets you witness the natural as well cultural variety of Nepal. Trekking in this area will let you to have a glimpse on the atypical combine of snowy peaks, tumultuous rivers, clear lakes, hot springs and attractive rural settlements occupied by welcoming local people. Annapurna Round Trek starts from Kathmandu to reach Besisahar and subsequently walking beside the Marsyangdi River all the way via rice and paddies pasture as well as passing by the rural settlements in the shade of the Annapurna and Manaslu ranges. As we go up, the vale turns to be narrow; cascades fall from high elevation and the backgrounds gains an uneven alpine form. The path after that turns west at the background the Annapurna range to leave into a Trans-Himalayan region occupied by Tibetan people. We will be amazed with the vistas of Annapurna II, III Gangapurna etc. Later we have a steady way up to the rustic community of Manang. We also go all the way through rough path traversing the Thorung La pass (5416 m). Additionally, during the walk you will get the chance to go in the region of Annapurna Conservation Area Project founded in 1968. The conservation region which we can glance while trekking in Annapurna region includes several of the richest lovely rhododendron forest on the earth and 100 varieties of orchids. Mainly, the upper alpine spot is dwelling areas for snow leopards, blue sheep etc. Likewise, other areas are the home for bird species like multi-coloured impeyan, kokla, blood pheasant, amongst an enormous numbers of varieties of birds, and also consist of butterflies as well as insects. With Nepal Planet Treks and Expedition you can go for a trip to Annapurna Round Trek which will let you to have a glimpse of shining rivers, terraced pastures of crops, rural inhabitation and amazing views of a number of snow-white mountains surrounded by huge glaciers that will make your trek one of the wonderful trekking of your life. - See more at: http://www.nepalplanettreks.com/annapurna-circuit-trek.html
http://www.nepalguideinfo.com
Email:sanjibtrekguide@gmail.com

AboutNepal Treks zei

This is great things to know some things by this sites Himalayan Recreation operating inbound outbound travel deals. here are many trekking agency & among of all this is one of best one Himalayan Recreation is also one of believable travel news portal and trekking agency from Nepal, we are leading our adventures activities to Everest base camp trek, Annapurna base camp trek, Ghorepani Poon hill trek, Langtang Gosainkunda Trekking, & others many.. Nepal tour which has been fashioned to every guest’s need and want. Apart from that we have come up with wonderful itineraries focusing for your experience with the local culture and natural beauty. However, the pre-made itineraries are subjected to alter so as to fit all of our guests’ anticipations. Himalayan Recreation is an officially recognized and licensed company by the Ministry of Culture, Tourism and Civil Aviation of the Government of Nepal.

for more info:
email: info@aboutnepaltreks.com
websites: www.aboutnepaltreks.com
Contact number: 00977-014388009
Thamel, Kathmandu Nepal.