![]() home > publishing > alj > 51.1 > full.text > Digital continuity: the role of the National Library of Australia |
|||
The Australian Library JournalDigital continuity: the role of the National Library of AustraliaPam Gatenby A paper presented at Digital Continuity: a Forum for Australian Universities, Swinburne University of Technology, 19 November 2001 Manuscript requested January 2002 This is a refereed article BackgroundIn Australia as in many other countries of the world, the internet is now an essen-tial mechanism for information access and delivery. A major issue for those concerned with long-term access to information is that many providers of information in digital form do not recognise the documentary or cultural value of the information they are producing, or if they do, are not doing anything to protect the information from loss, corruption or erosion. While this might also be the case for information in printed form, a number of circumstances make the concern more urgent in the digital domain These include the transient and volatile nature of the digital medium, the rapid uptake of publishing on the internet, the breakdown of traditional publishing controls and conventions, and the sheer volume of information being produced. Highly significant information published today may well have disappeared tomorrow if action is not taken from the time it is created to build in safeguards to ensure on-going access. It is therefore pleasing that collecting institutions here and overseas have taken up the challenge of responding to the imperatives of digital continuity, accepting that they have an enduring responsibility to ensure knowledge from and of the past is available to users of the future. Deposit libraries in particular are well placed to carry out an active role in ensuring continuing access to digital resources. Most are committed to the long term and have been legally charged since their inception to protect the published output of their respective jurisdictions - as a consequence, they have developed philosophies and practices that equip them to move confidently from managing information in print form to managing digital resources. However, this does not mean that the journey into the digital domain is a straightforward one for libraries or for any collecting institution. While overall objectives and roles might remain fundamentally the same, new approaches to carrying out traditional roles are required. New modes of thinking and operating are necessary and deposit libraries must forge new strategic alliances and partnerships to be successful in their role. There are many issues on the digital continuity agenda: there are various ways of looking at the issues but I think they can be usefully categorised as :
Whilst it might be stating the obvious, it is nevertheless worth stressing that the only way to make progress in resolving the issues is to tackle them - to learn from practical action without being overwhelmed by the complexities. As our experience at the National Library has shown us, a huge amount can be learnt from making a fairly modest start. What we are doingSo, what are we doing at the National Library of Australia? The Library's role in ensuring on-going access to digital resources is shaped by our documentary heritage responsibilities for Australiana generally. Our role is:
The Australian contextThe uptake of the internet in Australia for dissemination of information has been rapid and wide-ranging. Viewed from a publishing perspective, there is a considerable amount of significant information being issued in online form that is not also published in a more traditional form and the vast majority of this is available free of charge. There is little commercial publishing occurring yet. The big, mainstream publishers in Australia - like Penguin, McMillan, McGraw Hill - have not yet made the transition to online publishing - no doubt for good reasons. But the internet has spurred the emergence of a large number of newcomers to publishing and new publication models are appearing - for instance, the Library has collected the online publications of several publishers of Australian fiction who did not exist three years ago. In the government sector, all governments now have policies and agendas that give pre-eminence to providing information to the public in online form. This includes information about services as well as policies and reviews. The uptake of web publishing by the academic sector has been mixed and the situation is quite complex. A survey of online publishing in the academic sector carried out by the Library in late 1999 found that a wide range of people with diverse roles were involved in issuing information on university servers. Three broad categories of publications were identified:
Based on our experience, publishing on the internet in Australia in most sectors is for the main part decentralised, unco-ordinated and generally not supported by institutional policy and procedures - the situation would appear to be predominantly laissez-faire'. We have found through our digital archiving activity at the Library that it can be very difficult to discover significant information that is being published online. It would also appear that for the main part little attention is being given nationally to the range of digital continuity issues. This is illustrated by the Commonwealth government's Online Agenda for which the National Office for the Information Economy (NOIE) has overall policy responsibility. This agenda gives scant recognition to the need to ensure on-going public access to online government information, and does not address preservation issues at all. The Library's action planAgainst this background, the Library's action plan for digital resources gives priority to the following activities: In infrastructure development
In collection development we are:
In terms of access we are:
We have also:
Persistent identificationThe issue of persistent citation and access is central to the viability of scholarship based on the internet: if the location (or URL) of a resource changes then all links to the resource - in catalogues, bibliographies, research papers, journal articles - are broken and the resource is effectively lost to use and citations are rendered useless. To overcome this common problem, unique, unchanging names or identifiers are required for resources as well as a system for resolving these identifiers to the current location of the resource - it then would not matter that the resource might change location. In recognition of the importance of this issue the Library has devoted considerable time to investigating viable solutions over the last eighteen months. A consultancy was carried out to look at international developments and to advise on a suitable course of action for the Library in managing its own digital resources and in implementing a national approach to persistent identification. The consultancy found that even though several naming schemes are emerging overseas none of these is mature or reliable enough at this stage to commit to. As a consequence, the Library has developed a unique naming scheme for its own digital resources (surrogates and 'born digital' or those in PANDORA) and developed an in-house resolving system that works for all types of digital resource. We are also, in association with the Consortium of Australian State Libraries; in particular the State Library of Tasmania, developing a national naming schema, at this stage called the Australian digital resource identifier (ADRI). We intend to consult widely on the ADRI and we are currently considering whether the Library should establish a National Agency Service for persistent identifiers, much like that offered for ISBNs and ISSNs. (The Library has a website on persistent identifiers that records all our work on the subject, including the consultant's report, http://www.nla.gov.au/initiatives/persistence.html) Collecting and archivingIn recognition of its statutory collecting responsibilities, the National Library commenced selecting and archiving significant Australian web publications in 1996. This is now a routine activity for us, integrated with other mainstream collection development activities. It involves five full-time staff. From the outset the Library realised that preserving our online documentary heritage must involve collaboration as the task is beyond the resources and capacity of any one institution. A co-ordinated approach involving a range of stakeholders and a national strategy would be required. In this way, the range and scope of digital resources safeguarded for future use would be increased. Important groups for the National Library to work with include other collecting agencies, and the academic, Commonwealth government and commercial publishing sectors. However, our major partnership so far is with the state and territory libraries and ScreenSound, the National Film and Sound Archive. Together we are building the National Collection of Australian Online Publications, which resides in the PANDORA digital archive (http://pandora.nla.gov.au/index.html). The National Collection partnership is based on a formal exchange of letters and entails each institution taking responsibility to varying degrees for selecting, archiving, preserving and providing access to selected Australian online publications, according to agreed criteria and processes. (In this context, the term publications is used to cover websites, parts of websites, individual documents and both static and dynamic resources.) Characteristics of the National CollectionThe main characteristics of the National Collection are as follow. It is :
ScopeDetailed selection guidelines determine eligibility for the National Collection. The guidelines stress the 'Australianness' of the resource and take account of considerations such as subject content, authorship, the authoritative nature of the resource and whether it has been indexed by a recognised indexing service. To be selected, a resource must be judged to have some research value, to provide a cultural or social snapshot of Australia, or provide insights into particular ways in which the internet is used for information dissemination. To some extent the eligibility requirements build on the principles underpinning the traditional national bibliography role of national libraries but without giving pre-eminence to place of publication - a tricky concept in the online domain. Currently online publications with print equivalents are out of scope for the National Collection but this situation is under review - and we expect to commence including these from early next year. In fact, I should say that the selection criteria undergo regular review to ensure they remain relevant in the light of changes in the way the internet is used to disseminate information. The approach to building the National Collection is highly selective and it is acknowledged that many resources relating to Australia are not collected by the Library and its partners. The policy decision on selective acquisition was a pragmatic one, influenced by resource considerations, by the significance of information being distributed on the internet and by the requirement that wherever possible, everything archived must remain accessible to the public in the form in which it was originally issued. In spite of the selective approach, the National Collection already constitutes a strongly representative sample of Australian web publishing by academic, government, commercial and community organisations. Several of the websites captured - including the official site for the Sydney Olympic games - have already disappeared from the live Internet and are now only available through the PANDORA archive. The type of resources in the archive include the following:
So, the content of the archived collection is diverse and it has now reached a size at which it is valuable as a research and reference tool - it provides an evaluated, quality-controlled record of Australian research and scholarly information published on the internet, as well as more general interest information relating to cultural and social issues. It currently includes around 1700 titles (310gb) and is growing at the rate of about forty-five new titles per month - about one quarter of the titles in the archive are re-gathered on a regular basis. To exploit the potential of the Collection as a research tool we are encouraging libraries to point to it from their public web pages and to consider including bibliographic records for titles in the Collection that are of particular interest to their client group, in their own local catalogues. While all formats of publications are in-scope for the National Collection in PANDORA, no matter how diverse the media used or complex the software on which they are based, we have not yet worked out how to deal with resources with underlying database structures - and these are becoming more the norm. This subject is a priority for us at the moment and we hope to find a solution that will at least secure the content of the databases, if not the full interactivity. An alternative approach to archiving online resources being pursued by some national libraries is to harvest periodically the whole web domain for their country, using harvesting software. This approach has the obvious advantage over selective archiving of collecting much more information but there are still some major problems associated with it to do with data quality and providing public access. Nevertheless, the National Library is keeping a close watch on developments with whole domain harvesting and we will shortly commence a project to trial this approach ourself, perhaps restricted to one domain sector. It the domain harvesting approach is feasible we would see it as a way of supplementing the selective, quality controlled approach we would continue to take using the PANDORA system. The PANDORA digital archiving system was built by the Library to support development and management of the National Collection of Australian Online Publication. The system comprises harvesting software, a digital repository for storing and accessing titles collected, and underpinning management software. The PANDORA system is supported by IT staff at the Library and is developed in response to identified needs. A major upgrade to the management system was released recently and delivered greatly improved functionality and considerable work-flow efficiencies. The management system is designed to support:
The recent upgrade to the management functionality of the PANDORA system allows partner organisations to manage a very efficient workflow using the central management system. The management system is accessed using Internet Explorer 5.5 (although public access to the archive may use any web browser). Web access means that partners have no need for special desktop software to use the management system. Currently all titles selected by partner institutions are stored and delivered from the National Library's server but the management system will support distributed storage of content as an option from early next year. The Library regards the PANDORA Digital Archiving System as a part of the national information infrastructure and is willing to make it available to others for use in support of national distributed archiving objectives. Future co-operative development of the system to support various distributed archiving models would be considered by the Library as long as interoperability was assured and the objective of national access to archived resources (subject to reasonable restrictions) observed. Since it commenced collecting and archiving online publications in 1996, the Library has confronted a number of issues. Some I have already referred to - for instance, persistent identification and citation; selective versus comprehensive collecting; archiving databases; and problems associated with identification of significant resources - and some, such as authentication, we are still to come to grips with. Two other issues that are particularly relevant and which interest the National Library because they have direct bearing on the effectiveness with which we carry out our role, can be summarized as:
The selective approach to digital archiving drives the question of significance to the fore. Significance can be determined differently by different stakeholders, depending on their information needs and role. The National Library has defined it from the perspective of our role as a documentary heritage repository, conscious that this excludes resources that could well be significant to others. However, the Library's strategy has always been based on the belief that the National Collection that it is building with the state libraries should ideally form a component or node of a much broader national distributed archive. To be effective, a national distributed archiving approach would need to be based on shared responsibilities and understandings of what was required and what should be done. These and related questions are currently being addressed by an important project being undertaken by RLG and OCLC, two big library consortia in the United States. They are attempting to define the attributes of a trusted digital repository of research resources. A draft report was released for comment some weeks ago. The question for us then is who else should or could take on responsibility for archiving Australian digital resources in order to increase coverage at the national level? Clearly, the academic sector and its libraries have a valuable role to play here, either collectively or as individual institutions. In order to bring some discipline to the creation of intellectual property in digital form within academic institutions, a centralised publishing and archiving system which provides stable long-term storage of the digital resources produced by institutions, could be considered - and I believe some institutions have already made progress in this area. With any undertaking, I would urge consideration be given to the benefits of cooperation and using existing national infrastructure (including PANDORA) if suitable, in order to avoid costly duplication of effort and development costs. Questions the National Library is keen to explore with universities who are developing digital archives include the relationship of the National Collection of Online Publications to the archives, the suitability of the PANDORA digital archiving system in meeting the archiving needs of other sectors; and respective preservation roles. There is no doubt an extensive range of issues and questions that universities need to address themselves in determining their digital archiving roles, a major one being how they see themselves fitting into a national distributed archiving model. To operate as an effective component of a distributed national Australian archive, academic repositories would, for instance, need to be interoperable with other archives, be standards based, permit a level of national access, and accept responsibility for the long-term which could necessitate a commitment to preservation action. Another advantage in the academic institutions becoming involved in digital archiving is that the creators and/or distributors of information are often best placed to judge the lasting importance or significance of that information. It is quite legitimate to view some information to be of temporary value and not worth saving. This decision is best made at the time of creation so that certain actions can be taken at the outset that will greatly facilitate later preservation of those resources that are deemed to be significant. These actions involve things like use of standard software and hardware, applying metadata and naming resources, and are outlined in this set of guidelines we issued in January this year (http://www.nla.gov.au/guidelines/2000/webresources.html). The guidelines are intended to raise awareness of the need for creators to think in terms of long-term access when they produce a resource. Centralisation of academic online publishing and archiving would greatly facilitate the adoption of standardised approaches and implementation of common procedures, and allow responsibilities to be more clearly defined. It would also allow the decision-making process concerning significance and consequent action, to be built into the production cycle. The other major issue I want to touch on is the question of how digital archives will be used in future or how they could best support the different types of research needs that might emerge? These questions have bearing on how digital archives are developed or built and the research community is well placed to provide valuable assistance with attempting to answer them. For instance, researchers might want to be able to analyse large amounts of data rapidly for specific information or to identify trends and patterns over time using specially tailored computer programs. Such analyses are best carried out on the original source files rather than the form rendered to the user, so files would need to be organised in a manner to allow such rapid processing. Alternatively, the user might wish to view the online version of the resource as it was when archived, to experience sites as they originally appeared - or the 'look and feel' of the original. The Library's PANDORA archive enables this form of use. The way in which users find the resource they might want to view in this way is, however, another issue and is an area of growing research interest - for instance, will known item searches continue to be relevant or will browsing based on descriptors be the more common approach? If so, are different kinds of descriptors to the ones normally associated with print resources more appropriate? In conclusion, the National Library of Australia is committed to archiving and preserving significant, Australian online publications for long-term access. It views its undertaking as part of a larger, national endeavour and is keen to work with other groups and sectors to find solutions to issues, to strengthen the national digital archiving infrastructure, and to make the concept of distributed national archiving a reality Biographical information Pam Gatenby is assistant director-general Collection Management Division National Library of Australia. She has worked at the National Library of Australia since 1979 where she has held a number of senior positions in collection development, access and preservation, and for delivery of information services to the public. In her current position she has overall responsibility for the technical services and preservation operations at the Library. Collection development responsibility includes collecting significant Australian online publications for long-term access, development of a preservation strategy for digital resources, and exploring new ways of providing enhanced access to information. E-mail pgatenby@nla.gov.au.nospam (please remove '.nospam' from address). |
|