Australian Library and Information Association
home > publishing > aarl > 35.2 > full.text > Wine on the web
 

AARL

Volume 35 Nº 2, June 2004

Australian Academic & Research Libraries

Wine on the web: Australian wine information on the web and its prospects for long-term preservation and access

Most Web publications are lost forever and can never be retrieved again. We are wasting our digital heritage.[1]

Wendy Smith

Abstract Online information produced by Australian wine makers over the past eight years has been examined to determine what has been created and what is being preserved. The longevity of a number of Australian winery websites is examined, and sites in the digital archives of the National Library of Australia's Pandora project and the United States-based Internet Archive are compared and contrasted. Implications for future research if current preservation strategies remain unchanged are considered.

The way information is created and used changed significantly in the final decade of the 20th century. Digital information, in physical format or online, is the medium of choice for many producers, publishers and users of information. The problem of providing long-term access to digital materials is well documented. Future researchers will be disadvantaged if they do not have access to the full range of published materials being created today.

Information produced by the Australian wine industry represents a significant and varied body of work, covering diverse fields from scientific research to popular culture. The work reported below mainly concentrates on popular culture aspects, as demonstrated by the content of individual winery websites.

The Australian wine industry

Wine production in Australia has a very long history, dating back to the arrival of the first fleet in 1788. Olive farm, established in Western Australia in 1829, is the oldest winery in continuous operation. During the mid to late nineteenth century, many of today's major wine producers established flourishing wine making operations. These included those of George Wyndham (New South Wales) established in 1828, Dr Penfold at 'The Grange' (South Australia) in 1844 and Thomas Hardy (South Australia) in 1853.

Today, the wine industry is a major and important economic force in Australia. The Australian and New Zealand wine industry directory[2] reports that in 2003 there were almost 1800 wine producers in Australia, with 1.36 million tonnes of grapes crushed. The industry generated $A2.4 billion in exports, equivalent to exporting 1.9 million bottles of wine per day. The area under grapes has more than doubled from 70 000 hectares in 1993 to 160 000 hectares in 2003, while the number of individual wineries has doubled since 1996. At the same time, the wine industry is very skewed - the top 22 wine producers are responsible for 89 per cent of total sales. Five major groups - Southcorp Wines, the Hardy Wine Company, the Orlando Wyndham Group, Beringer Blass Wine Estates and McGuigan Simeon Wines - dominate the market.

The history of the Australian wine industry is well documented and preserved in print media. The existence of legal deposit regulations has meant that most mainstream published material on the industry has been captured into one or more of the major research libraries in the country, either at the time of publication, or by retrospective collection of earlier material. In particular, newspapers have been a major source of information for the history of the industry, an article on viticulture appearing in the first issue of the first newspaper published in the colony of New South Wales, the Sydney Gazette and New South Wales Advertiser of 5 March 1803.

Ephemera, non-print and manuscript materials have not been so comprehensively collected, with major libraries having only modest collections, but with the occasional information rich gem of significant research value such as the Sutherland Smith ('All Saints') collection at the University of Melbourne. Wine labels figure in many collections. A small number of manuscript collections remain within the original company or its successors, sometimes managed by an archivist. These include Tahbilk (Victoria), Houghtons (Western Australia), now part of the Hardy group of companies, and Yalumba (South Australia).

Given that the past is reasonably well documented, what is the prospect for the future? Print media continues to be used extensively. Books are still being written on the history of the wine industry (based mainly on written sources with the occasional oral history) and serial publications, both serious and popular continue to be produced. Popular culture aspects of the Australian wine industry see wine columnists writing in most major and some minor newspapers on at least a weekly basis. All this material will continue to be preserved in the usual manner.

In addition to print based media, there is a large amount of online information being generated. The web is used extensively to record information about all aspects of the Australian wine industry. Governmental and statutory bodies; scientific and research organisations, including the Australian Wine Research Institute and universities; wine organisations; tourist groups; wine retailers and commercial producers ranging from the largest conglomerates to the smallest boutique winery are all undertaking publishing on the web.

Scientific and other scholarly web publications and official government publications are likely to be preserved in some way by some agency. However, it is very unlikely that many small organizations or individual wine producers will themselves preserve in perpetuity their web output. If future generations of scholars want to research such material then it needs preserving somewhere, somehow. What is the likelihood of this information surviving into the future as a record of developments today?

This paper considers what Australian wine producers have been publishing on the web and how it is surviving. It is based on a research project undertaken over three years from 2001 to 2004 investigating Australian winery websites, and an analysis of some early wine sites from 1996/97. Most of the sites analysed are part of the '.com.au' domain.

Wineries on the web

The relatively small Western Australian Jane Brook Winery mounted the first Australian winery website, on 14 June 1995. One of the five major Australian producers, Southcorp, launched sites for its main brands a year later in July 1996.

Figure 1
Estimated percentage of Australian wineries with websites

Figure 1

Australian wine producers have been active adopters of web technology. The growth of winery websites has been rapid, although it appears to be slowing and stabilising at between 60 per cent and 70 per cent, as shown in figure 1. [3] This represents over 1100 individual winery sites. All the major producers now have comprehensive sites, although some were slow to follow Southcorp's lead.

Most winery websites have a core of basic information - brief history, contact details, wines produced and purchasing information. Some have much more detail, and include viticultural information: soil, weather, harvesting conditions, winemaking information, winemaker's notes, tasting notes, cellaring information, and winery newsletters, sometimes no longer produced in print format. All have some photographic images, sometimes only one or two, while others have an extensive coverage of the vineyard, the winery and wines produced. Many include images of wine labels.

The size of individual winery sites ranges from only a few pages to at least 60 or 70 pages. The larger sites usually include some archival information on previous vintages and wines. Some have similarities to winery brochures held in a number of ephemera collections, but others are much more information-rich and do not have any obvious equivalent in print media.

Stability of websites

Two aspects of web stability are important when considering preservation of information. One is the stability or persistence of the actual URL so that the information can be easily found online from the web. The other is the reliability, integrity and accuracy of the content of the site.

The nature of web publishing is such that currency of information is the overriding consideration for many sites. As a result, many winery sites will only contain details, say, of the current vintage with earlier information overwritten or discarded. As noted above some winery sites contain 'archival' information. Jane Brook Winery, for instance, provided access to an archived copy of its first website until at least 1998. However, by the time it mounted the third version of its site it had dropped the archived copy.

A small number of longitudinal studies of web stability have been undertaken. Most of these studies concentrate on the stability of the URL, and the implications of using web citations. Koehler [4] has examined a number of sites since 1996 using a measurement of 'half-life' as an indicator of stability, that is, the time for 50 per cent of sample URLs to have disappeared or changed. Most studies listed in Koehler give half-lives of around 1.5 to 4.5 years.

This study reports on eleven Australian wine related websites, first listed in the 1996 The Australian and New Zealand Wine Industry Directory as useful starting points for information about the Australian wine industry. The sites and their current status are given in Table 1.

Table 1
'Useful' Australian wine sites, c1996

Type Title URL in 1996 URL at 8 August 2001 Status and URL at 21 April 2004
Organisation - society Australian Society of Wine Education www.winebase.com.au/aswe www.aswe.org.au Still active in 2004, same URL as 2001
Commercial -gateway and wine producers Australian Wine Online www.winetitles.com.au/wineonline.html Same URL as 1996 Still active in 2004 www.winettitles.com.au/awol
Commercial -wine producer Bethany Wines www.wombat.com.au/wombat/attract/sa/barossa/bethany/index.html www.bethany.com.au Still active in 2004, same URL as 2001
Commercial - wine producer CSU Winery www.csu.edu.au/research/rpcgwr/winery.htm www.csu.edu.au/winery Still active in 2004 same URL to homepage, but links to /rpcgwr/winery
Commercial - wine producer Chateau Yaldara www.webmedia.com.au/yaldara www.simeon.com.au and www.yaldara.com.au Dormant since 2000/1 very brief information on Simeon site, Yaldara has more, but both are very out-of-date
Commercial - wine producer Jane Brook Winery www.highway1.com.au/business/janebrook www.janebrook.com.au Still active in 2004, same URL as 2001
Commercial - wine producer Kaesler Estate www.wombat.com.au/accom/sa/barossa/kaesler/index.html www.kaesler.com.au Still active in 2004, same URL as 2001
Academic - gateway (?) CSIRO Grapevine server www.adl.hort.csiro.au www.csiro.au gets to main home page, but no Grapevine server site Moribund by 1997(?)
Commercial -e-business e-wine Cellarmasters Online Wine Cellar www.e-wine.com.au www.cellarmasters.com.au Still active in 2004, same URL as 2001
Commercial - e-business Nicks Wine Merchants www.sofcom.au/sofcom/Nicks/index.html www.nicks.com.au Still active in 2004, same URL as 2001
Commercial - e-business Winetitles Online and Wine Industry Journal www.winetitles.com.au/index.html www.winetitles.com.au/wij/wij.html www.winetitles.com.au/wineonline.html Still active in 2004, same URL as 2001

By 2001, all except one of the sites had changed URL, although two were only minor changes (same host, changes of extension). One site, the CSIRO 'Grapevine server' appeared to have dropped out. For six out of the seven remaining wineries, the original URL in 1996 was a hosted site, but by 2001 they had moved to their own domain name.

By 2004 of the original eleven sites, nine are still active (a success rate of over 80 per cent) with only one winery dropping out between 2001 and 2004. URL stability from 2001 to 2004 was very good, with only one minor change. The stability of both the URLs and of site ownership may be one indicator of a maturing of the web and of its users.

However, the content and the style of all the active sites have changed significantly over the eight-year period, as noted below. Several of the sites are good examples of evolving uses of the web and changes in web design. One of the commercial retailers now includes additional wine-related information that takes the site far beyond just a sales brochure. The original CSIRO site had useful scientific information on it, and although a redirect existed to another site, that second site was more in the way of a commercial service rather than a 'self-help' one like the original.

How is wine on the web being archived and preserved?

Even if content persists at the original URL, there is no guarantee that the content viewed today is the same as the content viewed at some time previously. In order to preserve the content of sites that have either disappeared or that have changed with time, active intervention through some type of archiving process is needed. Two alternative approaches to archiving current websites for the future are considered below: the National Library of Australia's Pandora project [5] and the United States-based Internet Archive [6] and its associated WayBack Machine. [7]

Pandora

The National Library of Australia's Pandora project was established in 1996 for the purpose of preserving and providing access to significant Australian online publications. Pandora partners now include most of the state libraries. The State Library of Tasmania remains outside the Pandora partnership, operating its own individual project called 'Our Digital Island' (ODI) [8]. Since part of Pandora's brief is to include 'online publications on a wide range of subjects to document Australian society as it is represented on the Internet', [9] it is not unreasonable to suppose that some Australian wine industry material would be in scope for inclusion in Pandora or in ODI. Writing about the history of wine, Phillips says 'wine, in short, has a place in many kinds of history. It is integral to the histories of agriculture, industry, commerce and state regulation, and it is prominent in the histories of medicine, religion, gender, culture and the senses'. [10]

Pandora takes a very selective approach to building its archive. Before any title is selected for inclusion, it is tested thoroughly against the selection guidelines and the boundaries of the title established. Once selected and archived, it is tested for completeness and accuracy of capture. Any title in the Pandora archive is as near as possible a true representation of the original website. Some titles are archived once only, others are collected on a pre-determined harvesting schedule.

As at January 2004, Pandora contained around 5300 titles, with the National Library contributing around 3200 titles. Over an eight-year period this represents a capture rate of around 8 titles per week. The State Library of New South Wales is the next largest contributor with around 1100 titles.

The selection guidelines for the various partners in Pandora follow basically the same trend as those of the National Library of Australia, but with emphasis on state and local rather than national significance. Only South Australia picks out the wine industry for special mention. The State Library of Western Australia, in keeping with its policy for printed ephemera, intends to collect occasional samples of promotional and advertising sites, while the State Library of Queensland, under social and topical issues, states that 'ephemera will be collected in the same kinds of categories as have been established for the print collection.' [11]

The Pandora archive is biased against business sites as measured by the '.com and .net' domains. The distribution of domains in Pandora, analysed from a random sample of around 100 titles, indicates that overall government sites account for 35 per cent of the archive, with '.com' and '.net' together accounting for 34 per cent. Results are shown in Table 2. Although representation from the commercial domain may seem relatively high, the sites represented are not of the 'business' variety. Rather, they tend towards the personal homepage or community interest sites - writers, festivals and events, and tourism. Partner contributions to Pandora are not uniformly distributed over the average domain range, with the State Library of New South Wales being biased towards the '.gov' domain (58 per cent). The State Library of Victoria favours the .com or .net domain (70 per cent), particularly as mounted on the 'vicnet' site.

Table 2
Web domain types in Pandora (based on a 2 per cent sample)

  Overall (96) NLA (60) SLNSW (26) SLV (10)
Gov 35 per cent 27 per cent 58 per cent 20 per cent
Com/net 34 per cent 32 per cent 23 per cent 70 per cent
Org/asn 15 per cent 15 per cent 12 per cent 10 per cent
Edu 12 per cent 18 per cent 7 per cent -
Not known 4 per cent 3 per cent - -

The distribution of domains in Pandora appears to be quite different from the overall composition of the web. Lawrence and Giles [12] estimated that 83 per cent of servers contain commercial content; the other 17 per cent is distributed with science/education around 6 per cent, health, personal, societies, community around 2 per cent each, and government, pornography and religion around 1 per cent. Pandora itself strongly favours the smaller domains (except maybe pornography and religion!).

The Internet Archive

Also established in 1996 was the United States-based Internet Archive. This is an independent, not-for-profit organisation that attempts to capture and provides access to all materials made available worldwide on the public web. Although capture started in 1996, access through the WayBack Machine search engine has only been available since late 2001. The Internet Archive harvesting approach - with a complete scan of the web now being undertaken every two months or so - means that for any title in the Archive many instances will be preserved over time. The first harvests in 1996 and 1997, when the whole available internet was captured, were quite slow and took longer. Repeat harvests only collect changes, and so are faster.

Unlike Pandora, the Internet Archive takes an all-inclusive approach to gathering titles for its archive. Any publicly accessible web document will be collected unless the owner of the site excludes it. Also unlike Pandora, there is no quality control of the captured title. This lack of quality control means that the archived site is not necessarily a faithful representation of the original.

The Internet Archive has always been aware of inadequacies in its coverage and completeness, particularly where complex or dynamic sites are involved, but believes that it is better to do something that is less than perfect rather than wait till perfection can be achieved. The originator of the Internet Archive, Brewster Kahle has said that 'all the laundry from the past is shown to everybody in this collection!' [13] The Internet Archive frequently asked questions [14] page explains some of the problems in capturing websites including that of broken images, where a small red cross appears instead of the picture, and the capture of dynamic sites. It makes the point that 'as a general rule of thumb, simple html is easiest to archive'.

For the information seeker, the Internet Archive provides several types of information. The first is as a catalogue - give it a URL 'title' and it will tell you, via the WayBack Machine search results, whether it has the work listed in its collection and how many issues or editions it holds. Interrogate it by clicking on a result, and it will lead you closer to your title. You may only get a basic homepage with no lower links, rather like a title or an index page. Alternatively you may get a 'not on shelves' type of response, which means you have to try again at some later date. If you are lucky, you will get the full contents that you asked for.

The problems of the deep web

Both Pandora and the Internet Archive are chiefly concerned with easily harvestable, simple html based documents - the 'surface' web. Most publications in the first few years of the web were in this format. Database driven sites, where information is retrieved via a query box or drop down menu have become common in the past few years. This is known as the 'deep web'. The problem of how to capture the 'deep web' where websites have an underlying database of information is only just being addressed, [15] although the Internet Archive claims to be able to capture some types of dynamic pages.

Australian wine in Pandora

Search method

The Pandora archive can be accessed by a variety of methods. Basic searching is by keyword, subject and title searching and by alphabetically browsing by title. All present some difficulties if the exact title is not known. Limited keyword searching and browsing by title were used to identify the sites discussed below.

Results - keyword searching

Any basic keyword search on wine related terms such as 'wine', 'wine*' or 'viticulture' produces thousands of hits. Many of these are largely incidental or irrelevant. Keyword searching rarely leads to the relevant top-level title, but more often to a lower linked page. For instance, a search on 'winery' produces a number of hits for individual wineries, such as Devils Lair, Wynns Coonawarra Estate and Coldstream Hills. These are in fact all part of the Southcorp 'Wines of distinction' title which was archived once in 2000. Without knowing this, it seems difficult to identify the title itself and to obtain information about archiving schedules or number of archived instances.

Using keyword searches, the major source of wine information in Pandora seems to be University of Adelaide titles. Wine related material includes information on training courses, economic models, library resources etc. Except for the economic material - which is a large active title archived regularly in Pandora as 'Discussion Paper (University of Adelaide. Centre for International Economic Studies)' - it was difficult to identify the main title associated with the other material. South Australia has probably contributed other wine related titles but it is not immediately obvious, from the list of titles archived in Pandora, just what those titles are. The Annual Report - South Australia, Primary Industries and Resources is the lead title for some material found by keyword search (archived by the State Library of South Australia regularly since the 2001 report). As noted above, only South Australia picks out the wine industry for special mention in its selection guidelines.

There are some other sources of wine related information in Pandora including material from the ABC Landline and Science programs; ABARE eReports; and annual reports of some State Departments of Agriculture.

Results - title browsing

Systematically examining the more than 5000 titles in Pandora is tedious and requires judgement on what constitutes a wine title. Results indicate that only a very small number of titles are readily identified as wine industry websites, and even fewer as wine producers, as noted in Table 3. Since browsing relies on recognising a likely title, the method is not entirely satisfactory and some titles may have been missed if they were not recognisably about wine. No titles from the 1996 list of 'useful wine sites' were found in Pandora.

Only seven wine related titles in total were found. Five were from the '.com' domain and two from '.gov'. The two titles from the State Library of Victoria were general tourist sites. There is only one specific web page of a wine producer regularly archived - the small winery Sirromet in Queensland. A relatively recent listing is for Penfolds Wines, archived annually since 2002. 'Wines of distinction', a broad-based Southcorp publicity site was archived once in 2000. Due to the volatility of the wine industry, much of the information it contains is no longer current. For instance, the wine producer Rosemount is not listed in 2000 (taken over in 2001), and several of the producers listed in 2000 are, in 2002, no longer part of the Southcorp group. So, the only archived copy in Pandora represents one very small snapshot of the company, and says nothing about its continuing fortunes or misfortunes.

Table 3
Pandora - Australian wine titles, January 2004

Partner Total titles Wine titles found Domain Archiving schedule
National Library of Australia 3235 'Wines of distinction' .com.au Archived once, 17 February 2000
'Penfolds Wines' .com.au Archived annually 2002-2003 and ongoing?
State Library of New South Wales 1095 'Cellar door survey- wine tourism'. .gov.au Archived once, 3 October 2002
'Review of...Wine Grape Marketing Board' .gov.au Archived once, 28 October 2003
State Library of Victoria 471 'Yarra Valley Expo' .com.au Archived regularly, but only two instances for January and July 2001
'Melbourne Food and Wine Festival' .com.au Archived regularly - approximately annually, 2000-2003 and ongoing?
State Library of Queensland 212 'Sirromet Winery' .com.au Archived regularly, first instance 13 November 2003
State Library of Western Australia 93 None found    
Screensound 77 Not checked    
State Library of South Australia 63 None found*    
AWM 12 Not checked    
Total 5258      

*South Australia has contributed wine related titles, particularly relating to the University of Adelaide and other organisations, as noted above. It was not immediately obvious from the list of titles archived in Pandora just what those titles are.

The Tasmanian wine industry and Our Digital Island

The 'Our Digital Island' (ODI) project of the State Library of Tasmania operates separately from Pandora. It has been preserving titles related to the Tasmanian wine industry since 2000.

By 2001, three sites were archived under the broad subject heading 'Vineyards and wine merchants' - one vineyard and two general compilations of Tasmanian wineries. By July 2002, the number of wineries archived had increased to six. By 2004, a total of 20 entries were retrieved using the subject 'vineyards'. Of these, 12 archived in July 2001 were only of the entry home page. There were no active links beyond the title page, and no obvious link to the original publisher's site. This reduces the usefulness of the listing.

Australian wine in the Internet Archive

Search method

The Internet Archive was searched using the WayBack Machine. This relies on knowing the URL of the site to be investigated. Two sets of URLs related to the Australian wine industry were examined to determine the extent of their Internet Archive representation:

the eleven 1996 'useful Australian wine sites' discussed above, and

a random sample of around 130 Australian wine producers whose web status has been monitored for the past three years.

During the period January to April 2004, each site with a web presence was called up via the WayBack Machine using both its current URL and any other previously known URLs.

Using the WayBack Machine is not always satisfactory, although a reconstruction of the Internet Archive in December 2002 produced a more stable and 'tidied-up' environment, including the removal of some extraneous duplicate results. However, searching the Internet Archive can still lead to frustration - messages such as 'file not in archive' or 'data retrieval failure' appear quite frequently; images are often not captured or links not followed through. For some URLs, several attempts had to be made over a period of weeks to access the Internet Archive site. Sometimes the full URL may return a result 'sorry no matches', with a suggestion to 'click here' to search for all pages at the top level URL. This can sometimes produce a useful result. Occasionally, JavaScript can cause the displayed page to be the current online page.

The WayBack Machine display page was chosen to 'show all' entries, including duplicates. This means that it includes duplicate records harvested on the same day. The mechanism of harvesting allows for duplicate results as it does not specifically distinguish between collecting from the actual site itself and the site as collected from another linked site. On the 'lots of copies keeps stuff safe' theory, multiple gathering of the same site as part of the same harvest may well be justified. The harvesting process also checks for updated information, individual results being marked with a '*' to indicate this. Care needs to be taken in interpreting this, however, since although the results page states that '* denotes when site was updated', the FAQs refer to an updated 'page' not a 'site'. This was confirmed when some winery results were checked. The '*' appears to apply to the specific URL called up, not to lower links within the same site.

Since there is generally a six-month time delay after capture before the material is mounted on the Internet Archive, as at April 2004, results are only available to at best July 2003.

Results

The Internet Archive provides access to historic web based information that is not readily available anywhere else. It was possible to find entries in the Internet Archive for all the eleven 'useful' sites listed in Table 1, even those that no longer exist. On average, 68 entries per title were found. The smallest number of entries was for the defunct CSIRO 'Grapevine server' which had four entries from December 1997 to December 1998.

Although all eleven titles were listed in the Internet Archive, the quality of the content was variable, as indicated in Table 4. Titles hosted out of another provider generally returned poor results. As wineries developed their own sites the quality of information retrieved improved, as long as the site was based on simple coding. As expected, the more complex the site the less successful the retrieval. Problems of broken images were quite common, particularly in some early sites.

On the positive side, however, the Internet Archive contains content for both of the defunct sites and a copy of the inaugural Jane Brook site. For several of the sites a range of styles and content over time has been archived.

When a larger sample of wineries was looked at, similar results were obtained. As Table 5 shows, all 72 wineries with a web presence were represented in the Internet Archive. The sites have been captured an increasing number of times per year - from one in 1996 (when harvesting commenced) to more than five times by 2001. Projecting results from the first half of 2003 could see the rate of capture doubled to over 10 per annum. The maximum number of entries per year for any one winery was 16 and the minimum was one. The largest number of entries over its Internet Archive lifetime is Rosemount, with 53 entries over three years.

Nearly 50 per cent of all sites were first captured into the Internet Archive in the same year that they were listed in the printed edition of the Australian and New Zealand wine industry directory. A further 30 per cent were first captured either the year before or the year after they were listed in print. This indicates that the Internet Archive operates a very efficient capture process and is a very important agent in capturing the existence of a particular website.

Table 4
Internet Archive results for 'useful' Australian wine sites, c1996

Title and URLs Internet Archive presence Content of site
Australian Society of Wine Education
1996 www.winebase.com.au/aswe
2001, 2004 www.aswe.org.au
21 results from 1996 URL, but no ASWE information present
22 results from second URL, from 12/00 to 6/03, links seem OK
Host content only first URL
Some content post 2000
Australian Wine Online
1996 www.winetitles.com.au/wineonline.html
2001 www.winetitles.com.au/wineonline.html
2004 www.winettitles.com.au/awol
118 results for 'winetitles' from 12/96 to 6/03 Some content early, later only home page
Bethany Wines
1996 www.wombat.com.au/wombat/attract/sa/barossa/bethany/index.html
2001, 2004 www.bethany.com.au
4 results 'wombat' (truncated) URL from 12/96 to 4/97, but no access to 'attract/...'
40 results from second URL, from 10/99 to 6/03
Host content only, first URL
Early content own URL only to home page, later content to lower links - gives a range of at least three styles, some broken images
CSU Winery
1996 www.csu.edu.au/research/rpcgwr/winery.htm
2001, 2004 www.csu.edu.au/winery
No results with extensions beyond top level
124 results 'csu' 2/97 to 6/03 but no links to winery
Host content only, no winery content
Chateau Yaldara
www.webmedia.com.au/yaldara
2001, 2004 www.simeon.com.au (but can also find it via www.yaldara.com.au)
2004 very brief information on Simeon site, Yaldara has more, but both are very out-of-date (2000/2001)
No results from 'webmedia' for Yaldara
40 results 'Simeon', 11/98 to 6/03, 1998 entry has no information on Yaldara, 2001 entry has information the same as the current online site
63 results 'Yaldara', 6/98 to 6/03, home page of early entries differs from later entries but data retrieval failures on following some links.
Some content for 'Yaldara' URL, broken images, some inactive links.
Jane Brook Winery
1996 www.highway1.com.au/business/janebrook
2001, 2004 www.janebrook.com.au
One result 'highway1', 2/98,
56 results 'janebrook', from 1/98 to 6/03,
Some content from first URL including link to original site, but some broken images.
Some content from 'janebrook' URL including original site + images.
Kaesler Estate
1996 www.wombat.com.au/accom/sa/barossa/kaesler/index.html
2001, 2004 www.kaesler.com.au
No results from 'wombat'
17 results from 'kaesler' from 4/01 to 7/03, broken images only for earliest result, some links OK in later results.
Some content from 'kaesler' URL, broken images only for earliest result, links OK in later results. Reflects changing web styles.
CSIRO Grapevine Server
1996 cgswww.adl.hort.csiro.au
2001 www.csiro.au gets to main hp, but no Grapevine Server site
Original URL in IA - 4 results 12/97 to 12/98 Some content from first entry some links OK, information no longer there in later entries.
e-wine Cellarmasters Online Wine Cellar
1996 www.e-wine.com.au
2001, 2004 www.cellarmasters.com.au
Eight results for 'e-wine' from 12/98 to 4/99, but IA copy defaults to 'cellarmasters'
77 results for 'cellarmasters', from 5/98 to 6/03, sample links OK
Some content from 'cellarmasters' URL, broken images.
Nicks Wine Merchants
1996 www.sofcom.au/sofcom/Nicks/index.html
2001, 2004 www.nicks.com.au 2004 includes some good general information on the site
No results 'sofcom'
37 results 'nicks', from 1/99 to 7/03.
Host content only, first URL
Some content 'nicks' URL, including general information after 2001.
Winetitles Online and Wine Industry Journal See Australian wine online above  

Table 5
Sample Australian winery presence in the Internet Archive, 1996-2003

Year No of wineries Total entries Average entries per annum per winery
2003 (to July only) 72 378 5.25 (six months only)*
2002 64 357 5.58
2001 53 279 5.26
2000 26 104 4.00
1999 16 57 3.56
1998 7 15 2.14
1997 1 2 2.00
1996 1 1 1.00

* If the existing rate of capture is maintained, then an average of 10-11 entries for 2003 is projected.

Conclusions

Pandora and the Internet Archive take quite different approaches to archiving online information, and this is reflected in the results. The availability and accessibility of Australian wine producer websites is considerably better in the Internet Archive than in Pandora:

Every site examined has some Internet Archive presence. At the very least, the Internet Archive is acting as a worldwide bibliography of web publications. It contains good capture of early winery websites and provides a perspective on change of site content and design over time.

Winery sites are poorly represented in Pandora and ODI: there has been no capture of the eleven 'useful' wine related titles and there is only a very limited number of winery titles in Pandora and ODI.

The difference in approach taken by the two archives is analogous to the choices faced by a conservator:

to undertake full conservation of an individual item: the resource intensive 'single item salvation' approach (Pandora), or

to undertake a phased preservation strategy: the bulk treatment of whole collections (Internet Archive).

The approaches are complementary rather than in competition. The Internet Archive is not perfect. It does however provide a good historic perspective on web publishing, if the modest sample investigated above is any indication. It shows trends in the use of the web, its coverage, styles and content. It is not, nor does it pretend to be like a library of perfectly conserved books. It is more like a phase-boxed collection of rather random papers or incomplete runs of decaying books with the odd page missing, but it is exciting to rummage through to see what has survived from the past. Researchers today are often grateful for what has been preserved from the past, no matter how incomplete. The Internet Archive is showing the way by preserving and making available a massive collection of publications mounted on the web since 1996.

Pandora, on the other hand, resembles a small collection of perfectly conserved books. Every item in it is complete and well preserved, but it does not come close to meeting the broad-based 'warts and all' coverage of the web that the Internet Archive provides.

During 2001-02, the National Library contemplated a broader collecting scope for Pandora, including 'whole of domain' harvesting either in full or selectively. Cooperation with the Internet Archive was also discussed. [16] However, due to resource restraints and institutional priorities, Pandora's selectivity seems set to continue at least at the national level. When a review of Pandora's selection guidelines was undertaken in 2002-3 as part of the National Library's Balanced Scorecard Initiative,[17] it was hoped that this would lead to the library increasing its collecting of online publications. In fact, it led to a reduction, with a concentration on six categories of publications, with more general publications to be collected on a rolling three-year cycle. 'Food and drink' will be included in 2005-2006, ten years after the first winery website was mounted. This represents a lot of lost information.

When talking about Pandora and cutbacks in collecting, Phillips predicted in 2003 that 'what the library and partners do not archive is likely to be lost because, at this point in time, it is unlikely that anyone else will do so.' [18] However, as demonstrated above, since 1996 the Internet Archive has been a real alternative, picking up Australian material not archived by Pandora. The early Jane Brook website is just one example of this.

The wine industry is a major contributor to the Australian economy and its activities touch a large percentage of the population. Much wine information on the internet relates to general and popular 'public interest' or 'lifestyle' information rather than the scientific and technical. This does not make it any less deserving of preservation - it forms part of the history of the total industry and of the social history of Australia.

There is nothing unique about the Australian wine industry; it was chosen as an example of the broader use of the web by Australian society. If the Australian wine industry is poorly represented in Australia's online archiving program, then this probably also applies to almost any other industry and to any other broad-based cultural activity in Australia.

The Internet Archive represents the only viable alternative, and as indicated above, performs as both a record of existence and of content for the web worldwide. As long as the Internet Archive survives, this historic information and the record of it will remain available. However, as a non-governmental, not-for-profit organisation the Internet Archive's survival relies on the availability of its mostly private sponsorship. If the existence of the Internet Archive is any way threatened in the future, then some action may need to be taken urgently to ensure that its content is preserved. If it is considered important to preserve historic Australian information in the Internet Archive then resources will need to be made available. This would of necessity be quite time consuming and resource intensive. However, without some active intervention to save a decade of Australian online publishing beyond what is in Pandora, a large body of information would be lost forever.

Notes

  1. Mannerheim, J 'The New Preservation Tasks of the Library Community', International Preservation News, p8, 26 December, 2001
  2. The Australian and New Zealand Wine Industry Directory, Winetitles, Adelaide, 2004
  3. Based on an analysis, over the period 1997 to 2003, of a sample of wine producers drawn from The Australian and New Zealand Wine Industry Directory
  4. W Koehler 'A Longitudinal Study of Web Pages Continued: A Consideration of Document Persistence' Information Research vol 9 no 2 January 2004 at http://informationr.net/ir/9-2/paper174.html, referenced 19 March 2004
  5. Pandora documents at http://pandora.nla.gov.au/index.html
  6. Home page for the Internet Archive at http://www.archive.org
  7. WayBack Machine at http://www.archive.org/web/web.php/
  8. 'Our digital island' at http://www.odi.statelibrary.tas.gov.au/
  9. Para 4.3.2 of National Library of Australia Pandora selection guidelines at http://pandora.nla.gov.au/selectionguidelines.html
  10. R, Phillips A short history of wine, London, Allen Lane, 2000
  11. All Pandora partners selection guidelines are at http://pandora.nla.gov.au/documents.html
  12. S Lawrence and C L Giles 'Accessibility of information on the web' Nature pp107-109 400 8 July 1999
  13. B Kahle 'The Internet Archive' RLG Diginews vol 6 no 3 15 June 2002 available at http://www.rlg.org/preserv/diginews/v6-n3-a1.html
  14. FAQs for the Internet Archive at http://www.archive.org/about/faqs.php, referenced 11 May 2004
  15. 'Deep web archiving' Gateways February 2004 no 67 available at http://www.nla.gov.au/ntwkpubs/gw/67/html/10deepWater.html, referenced 12 March 2004
  16. 'Digital archiving: a progress report' Gateways no 55 February 2002 at http://www.nla.gov.au/ntwkpubs/gw/55/p22a01.html
  17. Collecting Australian online publications at http://pandora.nla.gov.au/BSC49.doc
  18. M Phillips 'Collecting Australian Online Publications' Australian Academic and Research Libraries, p218 vol 34, no 3, December 2003

Wendy Smith is a research scholar in the School of Information Studies, Charles Sturt University, Wagga Wagga NSW 2650. E-mail: wendy.smith@alianet.alia.org.au.nospam. (please remove '.nospam' from address)


top
ALIA logo http://www.alia.org.au/publishing/aarl/35.2/full.text/smith.html
© ALIA [ Feedback | site map | privacy ] pc.rm 11:59pm 1 March 2010