![]() home > publishing > alj > 54.1 > full.text > Case Studies in implementing Functional Requirements for Bibliographic Records [FRBR]: AustLit and MusicAustralia |
|||
The Australian Library JournalCase Studies in implementing Functional Requirements for Bibliographic Records [FRBR]: AustLit and MusicAustraliaMarie-Louise Ayres AustLit: Australian Literature Gateway - the world's first major FRBR implementation - was developed as a co-operative service involving eight universities and the National Library of Australia in 2000-2001. This paper traces the reasons for adopting the FRBR information model, implementation experiences, and user responses to the service. The paper also considers the ways in which AustLit's nature as an academically oriented, value-adding service produced by a tightly knit group of contributors facilitated the adoption of the model, and how this might differ from a more standard bibliographic production and exchange economy. In particular, the paper raises issues about re-purposing existing MARC records for FRBR storage and display in the context of the MusicAustralia project. Manuscript received September 2004 These case studies were presented in February 2004 to a seminar on FRBR run by the Australian Committee on Cataloguing IntroductionThe International Federation of Library Associations' Functional Requirements for Bibliographic Records (1998)[1] model has made a major contribution to theorising bibliographic description, and the ways in which this activity needs to be rethought in the internet age. This paper describes the implementation experiences of a small group that implemented and extended the FRBR model in AustLit: the Australian Literature Gateway[2], and the desire of a similarly small group to implement the model in MusicAustralia. AustLit is a web-based discovery service about Australian writers and writing. It is the result of collaboration between eight Australian universities, each of which had previously developed specialist but non-standards-based biographical and bibliographic databases[3] and the National Library of Australia[4], with the Australian Research Council[5] providing development funding from 2000-2003. AustLit is a single, unified, stand-alone database with all data entry and editing undertaken via a single custom-built interface, and by a cohesive group of librarians and bibliographers - cohesive, albeit scattered across Australia. While the service and its data standards were carefully designed to support future data exchange, no pressing business case for sharing records between AustLit and other services has yet emerged. MusicAustralia[6] - due for production release in late 2004 - will also be a web-based discovery service about Australian music and musicians, and is being jointly developed by the National Library of Australia and ScreenSound Australia, with funding entirely drawn from existing institutional budgets. The service will be populated primarily by 're-purposed' records: records created for other systems, and then made available to MusicAustralia. Records will be created by librarians and non-librarians from a wide range of organisations for their own local needs and in a range of data storage formats. MusicAustralia will, therefore, depend almost entirely on the capacity to exchange data and to re-use what has already been created. These two scenarios present very different challenges to would-be FRBR implementers. This paper investigates some of those issues, reflecting on why the AustLit implementation was so successful, and what challenges lie ahead for MusicAustralia. The AustLit TeamThe nature of the AustLit development team was a significant influence on our adoption of FRBR and other models, and on the project's outcomes. The small size of the team, the fact that we could give the project our entire attention for a specified period, and the fact that we came from different intellectual traditions and professions[7] contributed both to our willingness to take a large risk, and our ability to manage that risk in order to deliver a successful outcome. Choice and extension of modelsWe placed a very high value on representing the publishing histories of works, and the bibliographers and researchers among us had a strong desire to represent the many different versions in which literary works can appear, and to show the relationships between those versions much more satisfactorily than was the case in traditional library catalogues and databases. IFLA's FRBR seemed to offer a way of meeting these needs. The model was untested, but we believed we had an almost unique opportunity to attempt an implementation, and that we would be doing our users a great service if we were successful. As FRBR aficionados will already know, the model includes the concepts of:
AustLit augmented the FRBR bibliographic description model with 'event modelling':
In the AustLit model, Works, Expressions and Manifestations all have attributes, and Creation, Realisation and Embodiment events all have attributes. AustLit also augmented the model by incorporating the concept of a Super Work, as suggested by a number of FRBR commentators.[9] AustLit is also much more concerned with people and organisations than the standard library catalogue, so the service also includes contextual records about people and organisations - a feature of an increasing number of specialist services, including MusicAustralia. Implementation: building the databaseOnce desired functionality had been specified - through raising a series of outrageously detailed research questions[10] and developing a model that could support answering them - it was clear that we would need to build, rather than buy, a system: there are currently no commercial systems which support all the data models, or which support the complex relationship concepts of Topic Mapping[11] in database design. All AustLit entities, including events and attributes, are topics, and relationships between those entities are also topics: the AustLit Gateway includes more than 4 million topics. Although the topics and their relationships are stored in conventional (but unusually highly normalised) relational database tables, the system converts the data into a common XML format at an early stage of output processing. From this common XML format, information is transformed into the desired final output format (typically HTML) using XSL (eXtensible Stylesheet Language). The XML representation contains enough information to generate alternative encodings such as EndNote or MARC or to augment the HTML with Dublin Core or RDF metadata.[12] At the outset of the implementation phase, we believed that the major risks lay in the complexity of designing a database to accommodate the FRBR, INDECS and Harmony models along with all the multitudinous relationships we had mapped out, and the likely performance of a highly normalised (consisting of some millions of 'topics') database. However, we took just over a year to design and build the system, to migrate more than 400 000 records from twelve different databases, all with different data structures and standards, and to develop a maintenance interface to support new work by staff working at eight Australian universities. Performance is excellent, despite heavy use by users and maintainers. Implementation: converting the dataOur major implementation issues had little to do with the models we chose. We substantially under-anticipated the risk that lay in migrating a range of existing non-standards-based databases to the new structure. Every new database brought new problems and we were not able to reapply previous conversion solutions. We did, however, encounter significant issues relating to interpretation of the FRBR and the pragmatics of implementation. The model was clearly written with a 'whole monograph' emphasis (although the model demonstrates that it can be used for other types of works, such as performances). AustLit's implementation was complicated because only a small portion of AustLit records fit this model: AustLit includes a wide variety of individual non-monograph items (individual poems, reviews and articles), and represents complex clusters of items such as poem sequences and author series. However, as would be expected of any catalogue or index, the overwhelming majority of AustLit records have one-to-one relationships between work, expression and manifestation, with conversion of these records being relatively simple. Our conversion methods evolved as we worked, with quite a number of mistakes made along the way, none of them irretrievable. Our conversion methodology was (roughly):
Implementation: maintenance interface and retraining the staffThe AustLit maintenance interface tightly couples the various model elements of work, expression and manifestation with interface elements: staff work within a single but highly customisable 'record' which visually mimics the 'enclosures' inherent in the model: these particular manifestations belong inside this expression; these expressions belong inside this work. The maintenance interface makes extensive use of the scripting facilities and Document Object Model (DOM) interface provided by Internet Explorer version 5.5 (or above). This means that AustLit maintainers require no client software, that start-up costs are minimal (all that is needed is a reasonable PC, IE5.5 or above and access to a network), and that staff have great flexibility in choosing which record level, events and attributes they wish to work with. As the number of events and attributes which staff can include is considerable, separate start-up 'templates' are available to them which include the events and attributes mostly commonly associated with particular work types, forms and genres (only the 'poetry' template, for instance, automatically presents the field for the work attribute 'first line of verse'). Retraining AustLit staff to work within the FRBR model was a high priority for the development team. Once they were familiar with the model, staff became very appreciative of the opportunity to represent works in a rich context. They enjoy the maintenance interface which gives them many choices about how to describe works and authors - in many cases recording information which had always been 'to hand' when describing items, but simply could not be represented in previous data models. The fundamentals of the model are easily understood by professional staff. It must be said, however, that distinguishing between new expressions and new manifestations of works can pose significant challenges. The application of the model to the 'real world' of describing real items in hand involves considerable ongoing discussion among the AustLit staff, and requires both regular guidance from content managers and thoughtful revision and enrichment of the manual. Of course, inconsistency of cataloguing practice is not confined to FRBR description,[13] and it is likely that acceptance of some level of inconsistency will need to occur in future large-scale implementations. Implementation: the user interfaceThroughout the development of the AustLit database and user interface, the team worried about how to present this new concept of works, expressions and manifestations to users. This seemed to be a very complex notion to try to convey through a web interface, especially given our own need to keep drawing diagrams and verbalising relationships for our own benefit. In the end, we chose to use 'light' visual clues such as dot points and separator lines, and simple prose statements such as 'This work has appeared in x different versions' and 'This version of this work has been published x times'. Like the maintenance interface, the AustLit user interface tightly couples the FRBR model with the presentation layer. Once users proceed beyond summary data, all expression and manifestation information is viewed - users do not have the choice of looking at only one expression record for instance. Users appear to find this interface easy to understand and navigate, and the relatively small number of Australian literary works with more than half a dozen expressions means that users are not overwhelmed by very long results screens. Other visualisations of FRBR data, including the 'card catalogue' and Windows-like 'directory tree' implemented in Research Libraries Group RedLightGreen service are certainly possible, and it is likely that as more FRBR databases are developed, an optimal 'OPAC' or even a standardised representation will be developed. In any case, with these services built using XML schemas and XSL style sheets, individual database owners will have much greater flexibility to change their presentation layers for local audiences - or even to generate multiple presentation formats for different audience segments - without affecting underlying models or data integrity. ResultsThe AustLit team succeeded in everything it set out to do. The implementation demonstrates that:
and perhaps most importantly, that users of this particular FRBR database find the presentation of information about related works to be both useful and comprehensible. MusicAustralia These are all significant achievements. But how helpful are AustLit's successes to those aiming to implement FRBR in a music environment? And in a larger less unified environments? Is it feasible to adapt FRBR records sourced from multiple cataloguing organisations - and where that multi-contributor model is not merely a legacy situation to be overcome in a new unified environment, but is in fact the ongoing service model, as is the case for our Australian union catalogue and for MusicAustralia? What factors might contribute or detract from our ability to use FRBR under this model? The FRBR and music communities are grappling with a number of modelling issues. There is as yet no consensus on these, but a significant body of literature is now available, and a number of music issues have been raised in the FRBR email list[14]. An important question is the definition of the abstract Work, especially for vocal music? Patrick Le Boeuf of the Biblioteche Nationale characterises this as the problem:
In the context of Waltzing Matilda, this could be expressed as a choice between modelling either:
Or:
In an ideal data model, music works and lyric works would be separately modelled, and a single 'manifestation' of a song (a published score or released track) could be modelled as manifesting expressions of two distinct works, or even, following a case in the MusicAustralia pilot, a single expression of a music work with two expressions of a lyric work, I'm going back to Yarrawonga/I'm going back to Wanganui[16]! This would be particularly useful in modelling expressions of lyrics that have also been published (expressed and manifested) as poems - Waltzing Matilda is an excellent example. The current FRBR list consensus appears to be that music and lyric works should be separately recorded where this is important to do so. This would mean that a cataloguing record for a 'manifestation' would need to represent the fact that the manifestation is a container for expressions and manifestations of both the music work and the lyric work. Projects such as Levy 2[17] - which is using MCR and OCR processes to derive both MIDI files and text files from digitised printed music, and which plans to create indexed and searchable databases of both sets of coded data[18] - will presumably need to address this problem, as the project has disaggregated a piece of printed song music into its two constituent parts. It should be noted that the ability to search for matching melodies and text elements via these databases would itself contribute to information about version relationships, which could be fed back into metadata descriptions. Similarly, analytical processes applied to other music facets - harmony; timbre; rhythm - may enrich our knowledge of relationships. Even if we could overcome this 'two in one' problem, a number of modelling issues remain: Should all notated and performed expressions of music be modelled as a single expression category:
Should expressions themselves be further modelled to include sub-categories for notated and performed expressions:
Should performed expressions based on particular notated expressions be modelled as expressions of expressions:
For music in the Western art tradition, modelling performance expressions as subsequent expressions of notated expressions has considerable merit, especially for scholars trying to demonstrate relationships between particular notated editions and particular performances. However, in the popular, jazz and folk traditions, it is almost impossible to argue for a linear relationship between score first and performance second. In some cases, printed music scores are clearly based on performances, in that the notated expression is subsequent to the performance expression. In many cases, it is impossible to determine which came first, as dates of publication and performance are inexact, or occur in the same year. In any case, our analysis of the MusicAustralia pilot highlighted the looseness of relationship between score and performance: popular music performances are relatively independent of notation. Time signatures, instrumentation, words and structure are routinely changed in the space between notation and performance. In the MusicAustralia context, in which existing catalogue records are being re-purposed, these modelling issues are largely academic at the moment, as most of the originating records do not include sufficient information to establish any Expression level attributes, let alone the more complex models alluded to above. In any future MusicAustralia 'FRBR-isation', we would certainly need to decide that 'Primo la musica', and undertake any aggregation - at least in the first instance - to our shared understanding of an identifiable music Work, which is bibliographically identifiable in many instances. The Variations2[19] project at the University of Indiana has adopted this approach, modelling Works and Instantiations, but not - for the moment - modelling Expressions. It is likely that international theory and practice will decide that the most simple and most sustainable approach to Expressions is to model all versions as expressions of equal 'weight', but to allow specific annotated links between expressions of a single work where the relationship is sufficiently strong. We will be closely monitoring international practice in this regard. Automating FRBRAs MusicAustralia does not incorporate processes for music or textual transcription and analysis, any automated matching of items and aggregation to Expression and Work levels could only occur at the bibliographic level. Our pilot and subsequent investigations have identified a number of barriers to automating aggregation of existing MARC records to the FRBR entities: Mismatching title and author information. A few of these mismatches were cataloguer error; most arose from different sectors using different (or no) name authority files, and from the fact that music titles can change sufficiently to prevent automated matching. Lack of granularity in bibliographic records for publication containers. Desired relationships are between items that are parts of publication containers such as albums and CDs. In many MARC records, individual titles are included only in contents notes rather than as added title or name/title entries. For example, we would be unable to create an explicit relationship between the printed score of the song Banish the Budget Blues and its commissioned performance by the Song Company, unless we attempted to match title fields with text in the contents notes of the National Library MARC records for each item. An additional concern here is that content titles are frequently abbreviated and sometimes contain extraneous text such as numbers that prevent matching. Lack of granularity in bibliographic records for unpublished material containers. The desired relationships are between items that are parts of non-publication containers such as manuscript collections and oral history recordings. Titles of individual music works are often 'buried' within a finding aid or a transcript. We concluded that the potential to automate aggregation of manifestation level records could increase if:
Of course, having more - or even any - shared authority work across institutions and sectors would greatly improve our ability to achieve the ideal. We also concluded that in order to store and fully expose these Work, Expression and Manifestation entities in idea FRBRised MusicAustralia displays:
Of course, in a MARC union catalogue situation, achieving this ideal would mean that this kind of automation plus inspection routine would need to become part of everyday resource sharing business. Resources for cataloguing are scarce, and most institutions - at least in Australia - do not have access to 'new' money to make this ideal a reality. However, while there are significant current barriers to using FRBR in large databases, or improving cataloguing to maximise FRBR potential, important activities underway in the library profession mean that there should be some hope of moving towards this goal in the future. The Joint Steering Committee for Revision of the Anglo-American Cataloguing Rules[20] is currently revising the Rules to harmonise with the FRBR entities. There have also been some recent pragmatic attempts to adapt or partially adapt existing MARC catalogues. The Research Library Group's RedLightGreen[21] project recognised that 'the expression is the problem', and has opted to aggregate manifestation records to the work level, with some basic expression entities - translation versions - more fully represented. At least one ILMS vendor[22] claims to have an FRBR system. The Library of Congress has made its experimental FRBR display tool[23] - which tries to use what data is available to display FRBR entities 'on the fly' - available for use by libraries wishing to experiment. ConclusionTo implement FRBR for the MusicAustralia resource database - which will consist of MARCXML records - is not on the National Library's current work plan. While the AustLit implementation demonstrates that it is possible to develop good matching algorithms and to design efficient web interface tools to support human inspection and decision making about relationships, the MusicAustralia business model does not support such ongoing data enrichment. However, the current Kinetica redevelopment project includes an FRBR feasibility component, and any lessons learned may be applied to MusicAustralia in the future. MusicAustralia certainly represents an ideal test bed of data. Music poses significant FRBR challenges, not least because music works are much more likely to exist in more than one expression and manifestation than most other forms of cultural production. Conversely, the benefits to users of an enriched [FRBR] view of the Australian music universe would be very significant, furthering understanding of music works in their notated and performed representations. So it is possible that those users - especially the music research community - may place such value on this 'version view' of the materials of their discipline that they will seek research funding to implement this long term vision. Endnotes
Biographical information Marie-Louise Ayres was manager of Special Collections at the University of New South Wales at the Australian Defence Force Academy 1994-2004 and project manager of AustLit: Australian Literature Gateway (http://www.austlit.edu.au), based at ADFA, 1999-2002. She was appointed project manager of MusicAustralia (http://www.musicaustralia.org) in 2002. She holds a PhD in Australian Literature from the Australian National University and is not [she wishes to point out] a librarian. |
|