The world of internet is changeable and volatile. If we are to secure long-term access to content on the internet we have to find mechanisms to bring order to the seeming chaos. Standards, for instance – although I learned last month in Tallinn that we may be rushing into those (see blog post). Persistent identifiers are another type of building blocks for long-term access to digital objects, because PIDs make sure that we can find the object that is being preserved, even if it is moved from one URL to another. But I learned last week that the persistent identifiers are not as persistent as one might hope for. Another illusion down the drain?
In front of a famous painting by Rembrandt (The Anatomy Lesson of Dr. Nicholaes Tulp), a working group led by Andrew Treloar (standing, at right) dissects the truth about persistent identifiers and their complex relationship with Linked Open Data.
The setting was a two-day seminar on persistent object identifiers (or POID, thus #kepoid) organized by Knowledge Exchange, the PersID project, SURFfoundation and Data Archiving and Networked Service (DANS) in the Hague (14-15 June). Regrettably, I managed to attend only the second day, but it was enough to make me understand how complicated this business is.
This is how it should work: a (national) organization (national library, scientific organization) assigns a unique identifier to a digital object, a so-called persistent identifier. If the object is moved from one URL (internet location) to another, the PI remains the same and a resolver service links the new URL back to the PID.
Borrowing from Andrew Treloar’s presentation (Australian National Data Service), here are the main complications associated with object identifiers:
- Granularity: what do you assign a PID to? In FRBR terms: to the work? to the expression? to the manifestation? to the item? Or, I may add, to a chapter? to a paragraph? Perhaps we even need multiple PIDs at multiple levels.
- How do you assign PIDs to objects that are not static, but that change all the time (e.g., databases)?
- How trustworthy is the object that is being identified (e.g., short url services)?
- How to point to something inside the object?
- Who owns the binding between the PID and the object?
And then there is the problem that there are a number of different PID systems (e.g., URN, DOI, PURL), which are not interoperable (comment by Juha Hakala: ‘It is encouraging that it is quite a long time since someone came up with a new PID system.’). And PID’s do not go well together with Linked Open Data (LOD).
‘Why is it so hard?’ – notes from Jeroen Rombouts’ computer (3TU.Datacenter)
Both Clifford Lynch and Andrew Treloar concluded that solving the technical problems of the PID challenge is the easiest part of the work to be done. Andrew built a pyramid of key success factors (photo above): at the bottom of the pyramid is a sustainability model, the second layer is about policies, the third is about procedures, and the top layer is about will or the intention of individuals to follow the rules and make the system work.
A room full of persistent identifiers – at right seminar chair Bas Cordewener (SURFfoundation).
In the end the attendees concluded that building interoperability between the existing PID systems is not a top priority. But getting PIDs to work with Linked Data is. Treloar proposed a 'Den Haag manifesto’ to bring this about:
The Hague Manifesto on persistent identifiers and Linked Open Data (LOD) (draft version)
- Make sure PID’s can be referred to HTTP URI’s including content negotiation
- Use LOD vocabularies, for schema elements
- Identify the minimum common set of schema elements, across identifiers in scholarly communication space.
- Use same-as relations to help PID interoperability across PID systems/schema’s
- Work with the LOD community on simple policies/procedures to improve persistence of HTTP URI’s.
Treloar will work with anybody who is ‘ready, willing and able’ to develop these principles.
Some other recommendations from the meeting:
- Do an inventory of different PID systems and make transparent how they work, so that organizations contemplating using PID’s know how to choose a system
- Find the common ground between the systems and use these to widen awareness of PID problems and systems
- Organize regular meetings between those who are involved in building PID infrastructures to facilitate alignment.