dinsdag 28 september 2010

Migreren of niet? The debate continues – part 3

Naar aanleiding van de blog van gisteren met David Rosenthal’s commentaar op deze blog, kreeg ik een reactie van René Voorburg, betrokken bij het project webarchivering van de KB. De reactie paste niet in het ‘reactie’-vak van de blog, dus publiceer ik hem hier graag als gastblog:

[Citaat van Rosenthal uit vorige blog:] "My statement that significant formats have not gone obsolete over the last 15 years is not a theory but a *fact*."

[Commentaar Voorburg:] That sounds like circular reasoning. Perhaps 'significant formats' are defined as formats that have not gone obsolete?

Rene Voorburg tijdens de IIPC bijeenkomst vorige week in Wenen I have some personal experience with files gone obsolete or not being able to use older files anymore. In each of these cases I seriously tried to fix the problem but didn't keep a record. So unfortunately my experiences are lacking detail and preciseness, but perhaps sharing them will be of some anecdotal value. The relevant context is that at home I used Apple Systems 7 up to 9, moved to MacOSX, then used Linux as my primary OS for about a year and then switched back to MacOSX. At work I've used anything from DOS up to Windows XP. I am a pretty knowledgeable computer user, for example experienced in various programming languages.

My experiences:

  • At a certain moment in time I discovered I couldn't open some pictures I shot years before with one of the first digital camera's, the Apple Quicktake 150. I couldn't  open them with any multiple format handling image programs (Graphic Converter, I recall).
  • The images I shot with the Quicktake 150 that I did manage to open nowadays all seem to have a faulty (or is it?) embedded color profile. Colors have become way too much saturated. I recollect the colors used to be normal.
  • I have (deleted in the meantime) some movies I wasn't able to play anymore (because they were encoded with a Sorenson-codec I didn't have access to anymore, this was on Linux). I don't know if these would have been accessible on current day MacOSX systems.
  • I've lost mails or parts of mails that were saved in a proprietary mailbox used (only!) by Microsoft Outlook for Mac when trying to migrate to another more common system (mbox).
  • I've experienced that newer versions of MS Word on Windows weren't able to open some files that were created with an older version of MS Word for Windows.
  • It is still quite common that I can't use a website because it was created for Internet Explorer which I don't (always) have access to.

Perhaps in a professional environment all these issues could have been overcome. In hindsight every solution is easy but hindsight usually means you are too late already …

René Voorburg, KB web archiving project

Deze blog houdt zich uiteraard zeer aanbevolen voor nieuwe reacties …

4 opmerkingen:

David. zei

Thanks for the examples. I have three questions for each of them.

First, when exactly were the files in question created? I certainly accept that formats that were already obsolete by the mid-90s require software archeology to recover.

Second, for each example what was a feasible migration strategy that could have been performed at a suitable time in the past? Why didn't you perform it then?

Third, for each example what is a feasible migration strategy that you could perform now?

René Voorburg zei

Hello David,

I gathered some more detailed information on the main issues I described.

Regarding the Quicktake images: The images were made in ~1997. The ones I can still use now (the ones I kept) were inadvertently migrated to jpeg when I cropped them and saved them (back in 1997). Since I wasn't aware of the potential problem I didn't migrate the others. With a quick Google study I couldn't find a definitive answer what to do with them now. Some say that some applications can handle the files, but apparently only when the original Quicktake Quicktime extension has been installed. This requires a PPC Mac (emulated perhaps?) with and older system (pre 10, and presumably an older version of Quicktime).

Regarding the Sorenson-codec (movies): They were made around 1999/2000. The codec is still around (as a legacy install option for Quicktime) on current day Macs I read, so I shouldn't have deleted them from my Linux setup them migrated them (to format ...?) using a Macintosh computer.

Regarding the mail issue: In ~2000 I used MS Outlook Express for MacOS. This had a mailbox format used only by this program on the Mac.. In ~2001 I switched to Linux and wanted to migrate the mails to an mbox based application. I did the migration using a custom java based tool I found after a lot of searching. This went fairly well but some metadata (like some date headers) and attachments didn't come over correct. In hindsight, probably the best way to do that migration would have been to use an intermediate IMAP server. Maybe the current day MS Entourage still reads the Outlook Express files? Perhaps the format is not obsolete at all?

Regarding the MS Word-issue: The specific issue I mentioned might have been an incident (around 2003). However, I found that reading MacWord 5.1 files (from ~1992-1995) can be a problem. Current Word versions don't read them all properly. Some suggest emulation to do a migration. Specialized conversion tools might be able to do the trick.

My conclusion based on this exercise is that it is essential:
- to know what formats you are keeping
- and to know what formats are about to become obsolete and need to be migrated.
Proprietary software formats appear to

Thanks, René

David. zei

So, if I understand correctly we have:

1. QuickTake 100/150 images ~1997. Obsolete, but not a counter-example to my argument because no-one could claim that QuickTake was a widely used format. Even Apple abandoned it for the QuickTake 200, switching to JPEG. Migration likely now not cost-effective because the QuickTake image format doesn't appear to have been documented. Emulation is likely possible.

2. Sorenson-codec movies ~2000. Not obsolete, in that current Macs and Linux systems have support for them.

3. MS mailbox format ~2000. Not obsolete. To become an open format.

4. MS MacWord files pre-1995. Not a counter-example to my argument, the more so because MacWord 5 was no longer the current version as of 1994. The Open Office that I use has support for reading and writing Microsoft formats back to Word 6 (1993), full support for reading WordPerfect formats back to version 6 (1993) and basic support back to version 4 (1986).

David. zei

I believe in the meeting at the KB I used Microsoft Project 98 as an example of an obsolete format. I was repeating information I got at the 2009 PASIG meeting, but I should have checked this claim before repeating it.

In a comment on a post on my blog critiquing a post by Rob Sharpe that also used Project 98 as an example of obsolescence, Chris points out that there are at least two commercial viewers for Project 98, so it cannot be used as an example of an obsolete format.

I'm still looking for a counter-example to my argument.