Australian Library and Information Association
home > publishing > aarl > 35.4 > full.text > Australian digital collections
 

AARL

Volume 35 Nº 4, December 2004

Australian Academic & Research Libraries

Australian digital collections: metadata standards and interoperability

Philip Hider

Abstract A questionnaire survey was e-mailed to various institutions in Australia hosting digital collections. Nineteen institutions, including libraries, museums, archives, and other bodies, responded to the survey, representing a wide range of digital resources. It was found that metadata format standards are more concentrated than might have been expected, whereas the reasons given for their selection vary considerably. The relationship between format and content standards is quite close; supplementary, in-house guidelines are prevalent, as are controlled vocabularies. Only a few institutions had added to their collections by importing other digital resources together with metadata. Most institutions are working towards interoperability in specific ways, but these ways vary in two important respects. First, some institutions focus on internal interoperability, while others emphasise cross-institutional development. Second, in terms of how to achieve interoperability, some institutions emphasise adherence to metadata standards, while other stress the way in which new technologies can work with divergent metadata formats and content. A graph of interoperability is constructed from the survey responses, reflecting these different positions.

One of the main thrusts of information systems development over recent years has been towards interoperability. As Paepcke et al observe: [1]

Interoperability is a central concern when building digital libraries as collections of independently developed components that rely on each other to accomplish larger tasks. The ultimate goal for such systems is for the components to evolve independently yet be able to call on one another efficiently and conveniently. Digital libraries designed to scale to international dimensions need to be constructed from such interoperable pieces - not only for technical reasons but because information repositories and information processing services often need to be operated by independent organisations scattered around the world. Interoperability has been a critical problem in the 1990s and will be for the foreseeable future, as the number of computer systems, information repositories, applications, and users multiplies at an explosive rate.

A key strategy for achieving interoperability is the standardisation of metadata and their formats. The application of new metadata standards designed for the contemporary online environment has been seen as a step forward by many interested in developing more ambitious information systems, including many of those providing access to digital collections. [1] This article examines the kind of interoperability Australian digital information providers are achieving, and the use they are currently making of metadata standards.

A questionnaire survey was e-mailed in April 2004 to 40 individuals who had been identified as contact personnel for various institutions with digital collections. The starting point for identifying digital collections in Australia was the National Library of Australia's list of digitisation projects on its website. [3] Responses to the survey were also invited from subscribers to the 'Catlibs' (QUT) and 'Aus-archivists' lists. A covering letter asked for the questionnaire to be answered by the most appropriate member of staff, on behalf of the institution. The definition of institution was left open, in order to cover different organisational circumstances.

The questionnaire consisted of 25 questions. Many of the questions required simple check-the-box responses, and many were of a factual nature. However, some questions were open-ended, and some asked for opinion. The questionnaire defined, at the outset, digital collections as 'collections of digital resources stored on the institution's server(s).'

Nineteen completed questionnaires were received. These represented mostly libraries and related information services, as shown in Table 1, including the National Library of Australia, most of the state libraries, and several university libraries. No university as a whole responded to the survey; instead usually it was the library which responded on its own behalf (though in one response, it was stated that a co-ordinated, university-wide policy was being developed).

Table 1
Survey respondents by type

Type of institution Number
Library/information service 13
Museum 2
Archive 2
Other government department/agency 2
Total 19

Survey results

Accessibility and retrievability

Fourteen respondents (78 per cent) stated that most of their institution's digital collections were freely accessible to the public through the world wide web (one respondent was unsure). In three other cases, the digital collections comprised, or included, digitised text accessible through the catalogues to their general collections. In another case, the digital collection consisted of learning resources in a wide range of formats. All of the institutions had digital collections supported by some kind of retrieval system that indexed metadata for query-based searching.

Original formats

The original formats of the institutions' digital collections varied widely, as Table 2 shows, with no format predominant. Many of the institutions had digital objects for a range of original formats.

Table 2
Original formats represented

Format  
Digital (i.e. born-digital) 13
Printed text 13
Manuscript 11
Photograph (stills) 15
Film 4
Pictures 13
Three-dimensional objects 8
Sound recordings 6
Book covers 1

Metadata format standards

The format standards used for the metadata of the digital collections did not vary as much as might have been expected, with the two leading standards being MARC21 (not surprising given the large proportion of libraries represented in the survey responses) and Dublin Core (DC). AGLS (Australian Government Locator Service) and EAD (Encoded Archival Description) are also significant schemas. The distribution of standards is shown in Table 3. Many leading schemas (eg CIMI, TEI headers and IEEE LOM)4 were not represented at all. However, only one institution did not apply any format standard; several applied more than one.

Table 3
Format standards employed

Standard Fully Partially Total
MARC21 10 3 13
Dublin Core 9 4 13
AGLS 3 4 7
EAD 3 2 5
METS 0 1 1

There were a considerable number of reasons offered as to why the institutions chose their respective metadata formats. The reasons given by the respondents were categorised into types, as listed below in Table 4, with the frequencies indicated.

Table 4
Reasons for format selection

Reason  
Most appropriate standard for nature of collection 7
Existing standard for non-digital collections (so facilitates integration) 6
Community's favoured standard 3
Government standard 3
'Interoperability' 3
Supported by system 3
Existing expertise in the standard at the institution 2
Requirement for participation in a cross-institution project 2
'Simplicity' 2
'Extensibility' 1
Facilitates record exchange 1
'International' 1
'Lowest common denominator' 1
Suitable for local context 1
Supplements other standards 1

Local metadata elements

In twelve cases (71 per cent, with two non-responses), institutions included locally defined elements in their metadata schema. As might be expected, these elements represented a wide range of metadata, including that listed below, without any particular common categories.

  • collection names
  • collection type
  • comments
  • dimensions
  • exhibitions and events
  • further reading
  • genre headings
  • incidental art works
  • links
  • local course requirements
  • original format
  • resource type
  • review
  • source of item
  • subject type
  • website category
  • administrative aspects (2)
  • navigation (2)
  • site display (2)

Metadata content standards

With respect to metadata content, five institutions (26 per cent) do not apply any external standards. Interestingly, of these, two were using MARC21, while three were using Dublin Core. Of the institutions which do employ content standards, eight (57 per cent) apply AACR2; one institution also mentioned the Library of Congress Rule Interpretations (LCRI) and Kinetica policies. All of the institutions which apply AACR also used MARC. Five institutions apply guidelines based on AGLS (the usage guide produced by the National Archive of Australia). [5] Three institutions apply guidelines based on EAD; one of these institutions also cited ISAD(G). Three institutions apply the National Library of Australia's Guidelines for the Creation of Content for Resource Discovery. [6] Another institution follows DC and MARC21 usage guides. It would appear that the relationship between content standards and metadata schemas is generally quite close, with AACR and MARC historically interdependent, and newer content standards often based on a particular schema (rather than vice-versa).

All but one of the institutions had developed in-house guidelines, either in addition to external standards, or as the basis for their metadata creation. This might suggest that cross-institutional content standards may need more development, but it probably also means that institutions (still) pay considerable attention to local context.

The majority of responding institutions use one or more controlled vocabularies, mostly for subject indexing purposes. The Library of Congress Subject Headings (LCSH) are assigned by eleven institutions (58 per cent). Other vocabularies used include the Australian Pictorial Thesaurus (APT), and the Art & Architecture Thesaurus (AAT), the Getty Thesaurus of Geographic Names (TGN), the Australian Series System, and the APAIS Thesaurus. Six institutions use in-house schemes. Thus the importance of control is recognised, particularly in relation to subject access, but LCSH is the only widely used vocabulary. Unfortunately, few mappings between the various subject indexing systems are available, and this must be recognised as a serious impediment to interoperability.

New standards

Seven of the institutions (37 per cent) are considering using a new metadata standard in the short or medium term. The format standards under consideration are: EAD (3), IEEE LOM (2), METS (2), CIMI, DC, and MODS. One subject standard is also being looked at, the Australian Pictorial Thesaurus.

Alternative metadata formats

Seven of the respondents stated that the retrieval system(s) supporting their digital collections could accommodate other metadata formats not used for their own metadata. Formats that can be accommodated include Dublin Core, MARC21, AGLS, TEI headers, METS, CIMI, and any XML-based format; however, no alternative format was mentioned by more than two of the respondents.

Respondents were also asked how useful would it be, or is it, for the retrieval system(s) supporting their digital collections to accommodate other metadata formats. The responses are shown in Table 5. We see that most institutions considered this capacity very useful, though a significant number considered it of no particular use. Of those who did not consider it particularly useful, two respondents were from institutions with restricted access to their digital collections, another respondent was from an institution registered as an OAI (Open Archives Initiative) provider and harvester of resources with DC metadata, while the fourth respondent was more focussed on internal standardisation (within the institution's collections). It may be noted that one of the respondents who considered the alternative format capacity 'critical,' was also from an institution with restricted access to its digital collection.

Table 5
Usefulness of format accommodation

Value  
Not particularly useful 4
Moderately useful 1
Very useful 11
Critical 2
No response 1

Resource sharing - importing

Only six of the institutions (32 per cent) had imported digital objects into their collections from outside, and in only three of these cases were they accompanied (at least sometimes) by metadata, indicating a fairly low level of metadata importing amongst Australian digital collections at present.

Resource sharing - exporting

On the other hand, eight (42 per cent) of the responding institutions had allowed digital objects to be exported from their collection to an external collection. Of these, only two also imported digital resources; whereas four of the importing institutions had not exported. There seems little correspondence, then, between importing and exporting of digital resources.

Importance of metadata standards

Respondents were asked how important it was for their institution to describe its digital resources using established metadata standards. The responses are shown in Table 6. Given that all but one of the institutions applied external standards, it is not surprising that most thought established standards were very important.

Table 6
Importance of metadata standards

Value  
Not important 1
Somewhat important 3
Very important 15
No response 1

More significant were the main reasons given for the importance of standards. The responses were categorised by the author as listed in Table 7 below. There was a range of reasons offered, which ties in with the range of reasons given for choice of format standards. The word 'interoperability' was used by many of the respondents. This suggests that standardisation is seen as a key facilitator of interoperability. The optimisation of external access was also a common reason, which aims for a kind of cross-institution interoperability. The economic benefits of standardisation, in terms of metadata sharing, are perhaps represented, or at least partly, by the third most common reason, namely, institutional co-operation. Economics also underlies the next reason, portability. The quality of the metadata itself, from a retrieval perspective, may be partly represented by the reasons 'enhance retrieval' and 'standardisation,' but appears to be a relatively minor consideration. The other significant reason given here, federated searching, might be glossed as the optimisation of internal access. We may thus point to three important considerations which help standardise Australian digital collections: 'interoperability'; economics; and optimisation of access, both internally and externally. To what extent 'interoperability' overlaps with the other two considerations, economics and access, we shall discuss below.

Table 7
Reasons for importance of standards

Reason  
Interoperability 9
Optimise external access 6
For co-operative ventures 5
Portability 4
Enhance retrieval 3
For federated searching 3
Standardisation 3
Government condition 1
To fully utilise retrieval technologies 1

Federated searching

Respondents were then asked whether it would be beneficial for their end-users to be able to search on digital collections provided by other institutions at the same time as they search on the digital collections provided by their own institution. The responses are shown in Table 8. Although the digital collections represented in this survey were fairly diverse, all respondents thought their users would benefit from simultaneous access to other collections (though not necessarily others represented in the survey). Cross-institutional types of federated searches were, in fact, considered 'extremely beneficial' in a majority of cases, which would indicate that access has become one of the main goals for Australian digital information services.

Table 8
Cross-collection searching

Value  
Not at all beneficial 0
Moderately beneficial 4
Considerably beneficial 5
Extremely beneficial 10

Interoperability initiatives

Many of the respondents indicated that their institution was working towards achieving greater interoperability for its digital collections in specific ways. These initiatives were categorised as set out in Table 9 below. From this, we may obtain an impression of how Australian digital information professionals are striving for interoperability, according to what this means to them. Access to their collections through cross-institutional portals, external search engines, and through the Open Archives Initiative is being developed, accounting for 44 per cent (11/25) of the initiatives cited by the respondents. Integration of internal access through federated search systems is also being developed, accounting for eight per cent (2/25) of initiatives. Achieving interoperability through standardisation of metadata accounted for 16 per cent (4/25) of initiatives, while vague 'system development' projects accounted for 20 per cent (5/25).

Table 9
Interoperability initiatives

Type of initiative  
Part of cross-institutional federated search projects / portals 6
System development 5
Adherence to metadata standards 4
Part of open archives initiative 3
Development of internal federated search 2
Providing access through external search engines 2
Implementation of Z39.50 client 1
Cross-institutional resource sharing 1
Partnerships with NLA 1

Summary and discussion

The survey found a higher degree of concentration of metadata format standards amongst Australian digital information providers than might have been expected, given the wide range of resources and institutions represented. On the other hand, the reasons given for the selection of the schemas varied considerably. The relationship between format and content standards was quite close, although there was notably less use of the latter than of the former. The number of in-house guidelines suggested that development of fuller content standards may be needed, to complement the emerging format standards. Controlled vocabularies were frequently employed, although LCSH appeared to be the only candidate for a switching language role.

While several institutions had allowed their digital objects to be exported to other collections, few had imported digital resources with metadata attached. A lack of interoperability may be one reason for this, but another is likely to be a dearth of appropriate and affordable resources embedded with quality metadata.

Almost all institutions responding to the survey valued standardisation of metadata, and all of them desired federated search functionality. Most institutions were working towards interoperability in concrete ways. However, interoperability would appear to mean different things in different institutional contexts, a point we shall discuss in the remainder of this article.

Interoperability can facilitate resource exchange, including metadata exchange, and it can also facilitate access, access to one's own collections and to other collections. In this way, interoperable systems allow for the integration of resources and of access to resources. Such integration means significant cost savings; it also enhances information retrieval, particularly with respect to integration of access. For these reasons, the survey respondents were unanimous that interoperability was very desirable.

Nevertheless, although the respondents would be familiar with the concept of interoperability, in terms of information systems being able to 'work together,' their views on how and why the systems should work together varied. A major cause of this variation is likely to be institutional perspective. For example, the extent to which cross-institutional interoperability saves costs depends very much on the uniqueness of the resources in a given institution. Some digital collections are situated in institutions which, for one reason or another, take a more domestic approach. That is, they focus on increasing interoperability within their own institution and amongst their own collections. Other institutions, perhaps catering more to the public at large, take a more panoramic view of the interoperability cause, seeking to help integrate information retrieval across institutions, perhaps even across different domains and the entire world wide web. Naturally, several of the institutions were working on both internal and external interoperability.

As well as the different kinds of interoperability, internal and external, there are also different means to achieving it. Again, different information contexts may be a factor in the types of initiative that an institution might undertake. One can either standardise content, or one can develop systems which can handle different content, or both (perhaps to different extents). Some of the survey responses emphasised adherence to particular well-established and widely used metadata standards; others emphasised technological development, working around divergences of metadata and their formats.

The method (standardisation/translation) and scope (internal/external) of interoperability are two different variables which did not appear to be particularly associated in the survey responses. We may thus wish to think of them in terms of a graph, with internal and external interoperability at the ends of one axis, and standardisation and translation at the ends of the other. Many institutions managing digital information may be involved in more than one of the four corners of this graph, of course, but the nature of their collections' interoperability is unlikely to be evenly spread.

From the survey responses, we may therefore plot a graph of interoperability, which provides a rough picture of the nature of interoperability as practiced and being developed by Australian institutions, with respect to their digital collections. To determine each institution's co-ordinates, responses to key questions in the survey were assigned particular scores for metadata standardisation, system orientation/translation, and internal or external orientation. A single co-ordinate based on the net scores for both dimensions, was plotted to represent each institution's overall position, according to their survey response. The result is shown in Graph 1 below. The x axis (scope) represents the scope of the institutions' interoperability practice and objectives, from an intra-institution emphasis to a cross-institutional one (left to right). The y axis (method) represents the means practiced and the ways planned by the institutions to achieve interoperability, from an emphasis on system development and translation of metadata, to a focus on standardisation of metadata (bottom to top).

Graph 1

Graph 1
Scope and method of interoperability

We should not read too much into the graph, given the large degree of interpretation involved in scoring the survey responses, and the fairly blunt weighting assigned to the interpretations. However, without any need for moderation, the graph substantiates the general impression the author gained from the survey, in terms of the nature of interoperability offered to users of Australian digital collections. A deeper investigation might reveal specific reasons for current approaches to interoperability, and allow for a prognosis of future levels. Such an investigation may require more sophisticated sampling, and a more extensive exchange with leading players in the field. Given the specialised nature of interoperability activities and the relatively small number of experts, the use of focus groups might be a particularly effective methodology.

If the survey provides a snapshot of current digital collection practice and development in Australia, then we may conclude that while there is considerable attention paid to the concept of interoperability, the ways in which Australian institutions are helping to achieve it are diverse. There would appear to be slightly more emphasis on systems development and cross-institutional approaches, but significant efforts towards the further standardisation of content and the integration of local systems are also taking place. This diversity is probably healthy. As commentators such as Paepcke et al [7] have made clear, the goal of interoperability is a mammoth one, and needs to be tackled from many angles.

Notes

  1. A Paepcke et al 'Interoperability for Digital Libraries Worldwide' Communications of the ACM vol. 41 no 4 1998 p33
  2. R Tennant 'The Engine of Interoperability' Library Journal vol 128 no 20 2003 p33
  3. National Library of Australia Australian Digitisation Projects http://www.nla.gov.au/libraries/digitisation/projects.html [accessed 14 July 2004]
  4. A list of abbreviations not defined in the text appears at the end of this article
  5. National Archives of Australia AGLS Metadata Element Set Part 2 Version 1.3 Canberra National Archives of Australia 2002 http://www.naa.gov.au/recordkeeping/gov_online/agls/AGLS_usage_guide.pdf [accessed July 2004]
  6. National Library of Australia Guidelines for the Creation of Content for Resource Discovery http://www.nla.gov.au/guidelines/metaguide.html [accessed 14 July 2004]
  7. A Paepcke et al 'Interoperability for Digital Libraries Worldwide' Communications of the ACM vol. 41 no 4 1998 pp33-43

List of abbreviations

AACR Anglo-American Cataloguing Rules
APAIS Australian Public Affairs Information Service
CIMI Computer Interchange of Museum Information
IEEE LOM Institute of Electrical and Electronics Engineers, Inc Learning Objects Metadata
ISAD(G) General International Standard Archival Description
METS Metadata Encoding and Transmission Standard
MODS Metadata Object Description Schema
TEI Text Encoding Initiative
XML Extensible Markup Language


top
ALIA logo http://www.alia.org.au/publishing/aarl/35.4/full.text/hider.html
© ALIA [ Feedback | site map | privacy ] pc.rm 11:59pm 1 March 2010