Australian Library and Information Association
home > publishing > alj > 51.3 > full.text > Metadata: pure and simple, or is it?
 

The Australian Library Journal

Metadata: pure and simple, or is it?

Marilyn Chalmers

Manuscript received August 2002

In the interests of immediacy, the author waived her right to have this article refereed.


The grand plan

Southbank Institute of TAFE is the largest TAFE Institute in Queensland. Over the past few years, the library has implemented a number of very innovative services for the TAFE sector, due in part to the creativity and change management processes introduced by the systems librarian. This involved updating the library website with online services such as eReserve, eReference (incorporating live chat sessions), online tutorials, pathfinders or subject gateways, and internet subject links. Hitherto these services were to a large extent only being offered by universities to their students and were virtually non-existent in the vocational education sector in Queensland.

Just when everyone had come to terms with these innovations, Southbank decided to revamp its web presence and designed a whole new site. It was late in 2001 when I was asked to add metadata to the new web pages to standardise the structure and terminology of the information. This was a fairly major undertaking as there were many thousands of web pages concerned and there was a timeframe of less than three months before we were to go live.

As the major cataloguing librarian for Southbank, I had been inserting metadata into records in the library catalogue for years to ensure efficient client retrieval. However, my understanding of metadata standards and schemas was quite limited when the project began. I started with only a one-page document from the web developer on the basics of metadata, acceptable element lengths and very little else.

My library research skills at once came to the fore and I proceeded with all haste to locate as much information on the subject as I could to supplement my meagre knowledge. I was staggered by the wealth of information available and was soon drowning in articles. Subsequently, I am not going to launch into what constitutes metadata, why it should be used or list all the schemas available. I think these issues have been exceedingly well addressed by others and do not need reaffirmation. I will only say that project staff recommended the selection of three of Dublin Core's fifteen elements, namely: title, keywords and description (DCMI, 2002).

Why only three?

This decision was based on simplicity and the market segment which we were trying to attract and service, namely potential and current students who only required information on vocational education courses, support and administrative functions. In addition, extensive research of available literature indicated that many search engines did not support Dublin Core or many of the other schemas.

Southbank's mandatory metatags closely resembled Dublin Core's elements in the content group and were selected as serving the most useful purpose to our community since content-related information ranks higher than all else. Our clients do not search for intellectual property or instantiation data from our website, so these elements were assigned low importance. Publisher, rights information and date were already included at the base of all pages, so it was considered that reiteration was not required. We focused not on the number of elements to add, but on the terms or content which should be added to the web pages to provide the best possible search query matches.

Of the Dublin Core elements, the metatags 'title, subject or keywords, and description' were identified as meeting the set criteria because they appropriately described the content of each web page. However, it was decided to use the basic language tags (for example, description) rather than the Dublin Core tags (for example, DC description) due to reduced support of this schema by search engines. Metatags, if read by a search engine, are read first to ascertain and rank their relevance.

So how do search engines work?

Search resources are composed of search engines and directories. Crawlers or spiders read a web page, follow links to other pages and index the contents. The crawler visits every month or two to mark any changes to the site (Sullivan, 2001). However, these changes may take some time and result in loss of information for potential users.

Directories on the other hand, depend on human manipulation of data to create listings. Clients can submit a short description about their site or editors may review and create a description for a site (Sullivan, 2001). Matches are only based on these descriptions and can overlook some vital information in presenting results. Both types of search resources are at work on the web and can present an array of confusing, out-of-date and poor quality information due to the way they index and present information.

The description metatag is very important as it displays immediately below the title of the page in search results. This means that users can instantly determine what the page is about and whether they want to open it. If the description tag is not found, the search engine attempts to create a description, often with unimpressive results (Krause, 2001). Similarly, keywords should be relevant to the web page and only contain words or phrases mentioned within a page. Based on these characteristics, it was resolved that the three metatags, carefully placed between the <head> and the </head> tags in the HTML code of a page, should provide enough information to search engines to extract, index and return useful and relevant results to users. A simple template was devised in Microsoft Word to enter the data for each file. The web files were grouped by their subject content. Once a section was complete, it was embedded into the HTML code of each page and then uploaded to the internet.

Example:

<head>

<title>Databases, eLibrary, Library, Southbank Institute of TAFE</title>

<meta name='keywords' content='tafe, southbank, library, libraries, database, databases, information resources, reference material, research tool, online databases, electronic journals, indexes'>

<meta name='description' content='List of online databases available through the Southbank Institute of TAFE library website.'>

</head>

An evaluation of online metadata generators was undertaken to ascertain their worth in metadata creation. However, the vocabulary generated did not meet our expectations and was discounted in favour of manual manipulation and insertion of terms and descriptions from our own vocabulary which would aid the search functionality. However, neither metadata nor schemas can ensure that users will locate the information most relevant to them. But every year, search engines are refining their strategies and librarians are creating new ways of indexing and locating information. Metadata is a design element created to assist in this client/information interaction process.

Controlled vocabulary
Once the issue of which elements would be used as metatags was resolved, the question of vocabulary arose. What vocabulary would be used? After evaluating a number of thesauri, the decision was made to compile a vocabulary which incorporated terms from three major thesauri consisting of Library of Congress Subject Headings (LCSH), the Australian Thesaurus of Educational Descriptors (ATED) and the Vocational Education and Training Research Database Thesaurus (VOCED). To these, were added terms or phrases which related to Southbank content and could not be found elsewhere. Subject coverage by all three thesauri provided a broad range of terms with a good educational focus. This hybrid collection of free-text and controlled terms has also been recommended by the TAFENSW Online Metadata Project as offering the best combination of standardisation and subject terminology in metadata creation (Flack & Ryan, 2001).

Soon after commencement of this project, I was presented with a site search report from the web developer on keywords which formed the basis of search strategies to assist in the formation of the vocabulary and to get a feel for the type of words which were relevant to our users. It was clear that simple words worked best in most cases. For example, those users wanting to locate information on our English language courses consistently inserted 'learn English' into their search terms. As a result, this was incorporated into our vocabulary, along with accepted terms from the thesauri. This means that the vocabulary is very much tailored to suit our client base and is flexible, and appropriate to their needs.

Southbank now has a substantial vocabulary which was created in simple alphabetical form, incorporating cross references to words which have a broader, narrower or related meaning. The terms included synonyms and both single and plural forms of many words to provide a broad base of words or concepts to aid in information retrieval. A set of metadata guidelines were created to provide future 'metadatarians' with basic information about metadata and the vocabulary. The vocabulary is a valuable tool in metadata creation and can now be used by any staff member undertaking this activity without having to resort to other means. It forms the standard by which all Southbank web pages are created. The vocabulary is still growing as more pages are either updated or added to the website. Metadata creation for web pages now forms part of my routine tasks and takes only a small amount of time to perform.

Performance measurement
Performance is constantly being monitored and improved through statistical data gathering techniques. One technique is a monthly run of the Southbank site's activity logs to provide a variety of figures to assess usage patterns and support for our efforts in facilitating information dissemination to our users. An extract of data is included and has been averaged by month.

Nº of sessions per month20 000
Nº of page view hits159 000
Nº of library page hits26 000
Page views per session8
Most popular pagescourses and library
Top overseas trafficNorth America fifteen per cent
Asia six per cent

With around fifty per cent of the traffic originating from Australia, Southbank is attracting a reasonable amount of traffic from overseas, particularly North America. These interactions could provide long-term benefits to the organisation as Southbank has a large international student population and a growing online learning community. Whilst we have no comparative data to benchmark our figures, the results are pleasing nonetheless.

Another report created by the web developer which is run on a regular basis is a keyword matching report which logs keywords that have been used in the Southbank search box, and the number of matches and/or failures to locate information via the metatags. This report provides a quality control as it tests if our vocabulary is in fact doing its job. To date, the terminology has met with varying degrees of success due generally to spelling mistakes and unique terms entered by users, something beyond our control. Whilst the vocabulary does incorporate American spellings or very popular misspellings, it is impossible to account for all contingencies. However, we are still editing and incorporating new keywords into web pages and the vocabulary based on these reports.

Conclusion

The metadata project was one that gave me both satisfaction and frustration. Satisfaction resulted from the fact that as a cataloguer, I was performing a sort of labour of love by classifying or metatagging Southbank's web pages to enhance client retrieval. Frustration resulted from keyword compilations which often did not match client usage despite repeated search pattern reports and alterations. However, I do not think that this could ever really be overcome due to the uniqueness of individual approaches and thought processes. As a cataloguer though, it is frustrating to consider that all your hard work is still resulting in some lack of ability to access information by users.

Southbank's venture into metadata was a major investment in organising and maintaining data to enhance its operations and one that has produced mixed results. It was a steep learning curve for me at the time and has resulted in a deep interest in the subject. Metadata is but a means to an end and was introduced solely with good information retrieval in mind. However, the metadata venture has not stopped here. Trends will be monitored continuously to ensure that Southbank has the most relevant scheme in place and will be updated if required to provide relevant search tools for our most valuable market segment - our clients.

References

Dublin Core Metadata Initiative (2002) Dublin Core Metadata Element Set, Version 1.1: Reference Description. [Online] URL: http://dublincore.org/documents/dces/ [6 pages, accessed 29/7/02].

Flack, I and Ryan, J (2001) 'Influencing the future: Metadata and TAFE libraries', in Passion, Power, People: TAFE Libraries Leading the Way, Proceedings of the ALIA 2001 TAFE Libraries Conference, 21-23 October, Brisbane. [Online] URL: http://www.moreton.tafe.net/alia/program2.htm [site no longer exists - 6/1/03].

Krause, KK (2001) Meta tags - A promotion guide. [Online] URL: http://www.apromotionguide.com/metatag.html [2 pages, accessed 24/7/02].

Sullivan, D (2001) How search engines work. [Online] URL: http://searchenginewatch.com/webmasters/work.html [3 pages, accessed 24/7/02].


Biographical information

Marilyn Chalmers is a liaison librarian at the Southbank campus library of Southbank Institute of TAFE. She is also the major cataloguing and serials librarian for Southbank and runs workshops in cataloguing for staff development. Marilyn has worked at Southbank for four and half years and has been involved in a number of major projects, including designing an award winning website on training package implementation and assuming the role of content editor for the new website. For further information, e-mail marilyn.chalmers@det.qld.gov.au.nospam (please remove '.nospam' from address).

ALIA logo http://www.alia.org.au/publishing/alj/51.3/full.text/pure.and.simple.html
© ALIA [ Feedback | site map | privacy ] jl.ed 11:59pm 1 March 2010