Search This Blog

Thursday, January 17, 2008

DOI Cookie makes finding full text easier

Journal articles are frequently cited with just the DOI (Document Object Identifier) as a link. This would normally take you to the copy held on the publisher's site. As a member of an institution, you may have access to the article elsewhere. Your Institution's OpenURL Resolver can tell you where, if we can get it to resolve the link for you. Setting a cookie asking the DOI site to do the redirecting for you is a simple way of achieving this.

Test this button and the following link.

DOI Cookie set

Thursday, January 10, 2008

Archiving digital copies of electronic journals

Blogging on Peer-Reviewed Research
There was a time, back in the 'Middle Ages', where monks would cross the continent of Europe to study and copy precious manuscripts. Umberto Eco captures this world in his novel 'The Name of the Rose'. In making an accurate copy and taking it back home with them, the monks were both distributing and preserving the work. Having a copy of a work in one's archive made your institution more prestigious, and helped to preserve the work from a variety of medieval threats. But not everything is copied everywhere, use of manuscripts is also a cause of them wearing out and becoming unusable for future readers.

Later scholars may puzzle over the complex social model, the reciprocal relationships, the transmission history of any individual manuscript and its relation to the original work. Which manuscripts have integrity as authentic copies? What 'trust' should be placed in a Saxon Manuscript found in the collection of a monastery in Lombardy? Where two or more copies exist detailed comparisons would need to take place to remove doubt about integrity: (all copies agree on every word of chapter 1, but there are variant readings of sentences in chapter 3).

Fast forward to the present and similar problems of integrity when it comes to preserving electronic material, such as the articles published in electronic journals. There are various schemes which libraries might look to or participate in and some are based on very different models. Locking files away in a safe place for ever, or until they are the only remaining copies is the strategy of schemes like Portico. Libraries might support this model as insurance against future loss of access.

Another model is the foundation of the LOCKSS strategy. Here the emphasis is on maintaining the integrity of a widespread network of copies. There is not just the safety feature of having multiple copies stored as a defence against the many threats of the modern age. It also enables on ongoing work of comparison to take place, ensuring that variant readings of texts do not get a chance to develop.

Comparing electronic objects to medieval manuscripts will only get you so far in thinking about digital preservation, but considering the social model behind any scheme is an important way of deciding between there relative claims. There is more to digital preservation than an effective backup and disaster recovery program. The threats faced by digital materials include format obsolescence (no viewers for certain file types), temporary or permanent lack of access to original versions (the publishers and their archives have 'gone away'), degradation of media (no one can play 12 inch laser discs anymore) or digital degradation (the saved bitstream is 'corrupt' and will not load), lack of context (the metadata describing the object is no longer attached).

LOCKSS may not have all the problems of digital preservation solved, but it does have some advantages for libraries. There is a sense of taking up the responsibility for preservation and keeping it within your control, it relies on open source software with a track record, it allows librarians to decide for themselves on their collection development policy for preservation.

Some UK Libraries, including De Montfort University, have been involved in testing LOCKSS. As this Pilot comes to a close in February 2008 libraries will be deciding on whether and how to take their participation forward. It will be interesting to see which technology for preservation, and which social model is behind the strategies that do get adopted.

SEADLE, M. (2006). A Social Model for Archiving Digital Serials: LOCKSS. Serials Review, 32(2), 73-77. DOI: 10.1016/j.serrev.2006.03.007

Thursday, January 03, 2008

Institutional Repositories and Digital Preservation

Blogging on Peer-Reviewed Research

Hockx-Yu ,
. (2006). Digital preservation in the context of institutional repositories. Program: electronic library and information systems, 40(3), 232-243. DOI: 10.1108/00330330610681312

This article looks at the digital preservation dimension to developing institutional repositories. Helen Hockx-Yu was a programme manager working in this area for the JISC (Joint Information Systems Committee) so she is able to also outline the JISC view on Institutional Repositories and some of their initiatives to assist people developing them.

I am interested because De Montfort University Library is responsible for developing DORA: the De Montfort University Open Research Archive. I also administer the Library's LOCKSS Archive, which attempts to digitally preserve content from the electronic journal services to which the Library subscribes. There is not necessarily an overlap between these two activities: we are not using LOCKSS to preserve content from the Institutional Repository, though in theory, that may be possible.

Institutional repositories
The question: What is an institutional repository? is answered as 'digital collections that capture and preserve the intellectual output of a single or multi-university community'. DORA counts as at least claiming to be an attempt to capture De Montfort's intellectual output. DORA also falls within the pattern of most currently established institutional repositories in collecting 'e-prints' of scholarly publications. An IR could aim to include images, video, sound and other file formats. The wider the intake of file formats, the more problems are created from the digital preservation point of view.

Institutional Repositories are still new arrivals in the information landscape. Hockx-Yu can point to potential benefits for:
  • Authors: who gfain visibility.

  • Users: find information more easily (usually without charge).

  • Institutions: increase research profiles.

  • Funders: see wider dissemination of research outputs.
The push to develop IRs is partly from Institutions and those Research Councils who have mandated that research outputs be made freely available. Whether these motivation factors reach and mobilise the people doing the research has yet to be seen. The JISC has set up a number of projects, like SHERPA and ROMEO to help overcome some of the obstacles.

Digital preservation
Digital preservation refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Backing up the files may be part of this activity, but is not the whole process. For example, you can save a file, but unless you know what application you need to open it, it will not be of much use to anyone. There may be an argument for not needing to bother with digital preservation within institutional repositories. Much of the material held is a poor version of the polished articles published in commercial journals and such journals should take on the burden of preservation as part of their service. Some publishers are more or less willing to take on this responsibility. In my view this is one of the lessons of the LOCKSS Pilot project.
There may be other materials held in a repository, such as images or research databases where the repository holds the primary preservation responsibility. Other JISC sponsored projects can help to assess the costs and practices used to preserve such material: LIFE and eSPIDA .

One question that we as institutional repository managers need to address is that of format migration. Should we accept any file format submitted or migrate those deemed to be 'non-standard' to a format based on an open standard. Hockx-Yu does not say whether PDF or the archive version of PDF is appropriate for this, though the James study 1 of file formats may offer some guidance here.

The one JISC sponsored preservation that we have been involved with does not get any detailed treatment in Hockx-Yu's article. The LOCKSS Pilot attempts to preserve material published in commercial or open access journals. Some LOCKSS members have set up private networks to share the preservation work for material they hold. That would not be covered by the Pilot sponsored by JISC, but LOCKSS may still touch upon material held in institutional repositories.

There is a debate about who has the responsibility for digital preservation: institutions or commercial publishers. One way of gathering evidence on this problem would be to look at a set of an institution's intellectual output and check on how much was being actively preserved in a way appropriate for that institution. This could be to check against the titles preserved within Portico, if the institution is a Portico member. A similar test could be done against the 'archival units' held by a LOCKSS member. You could also compare for overlaps with the content held in an institutional repository. If only the IR is making an effort to preserve material, then by default it may have found itself with the primary preservation responsibility.

1 James, H., Ruusalepp, R., Anderson, S., Pinfield, S. (2003), Feasibility and Requirements Study on Preservation of E-Print,