The Australian Library Journal
Coding online information seeking
Philip Hider
Introduction
New screen-recording software promises to provide much richer data about user-system interaction than earlier transaction loggers have offered. Products such as Camtasia are now being used in human-computer interaction research and appear robust enough to be used in naturalistic settings for recording online information seeking. However, the data produced by these recorders can only be utilised for in-depth analysis of information-seeking behaviour if they are coded in systematic and comprehensive fashion.
For many people, interaction with computers is now a part of everyday life, indeed an integral part. A critical factor has been the advent of the internet and the world wide web. Tasks which previously entailed physical journeys to libraries, shops, banks, government agencies, and so forth, can now be performed through a personal computer and internet connection. Moreover, these tasks can be combined during one online session, in a similar way that a trip into town combined several different activities. The complexity of online behaviour has increased enormously over the past decade, in terms of the way tasks can be carried out and the way in which they can be integrated. The graphical user interface (GUI) has combined with the constantly expanding internet to create a new, virtual world in which people can exhibit behaviours almost as complex as those we see outside of it. It is a world ripe for study by behavioural scientists, including those who specialise in information-seeking (IS) behaviour.
Research into online behaviour is now well underway across a range of disciplines. Hargittai (2004) highlights three fields which have been examining online information seeking: marketing, human-computer interaction (HCI), and information science. Because the vast majority of online resources now exist only on the web, when researchers study contemporary online information-seeking behaviour, what they need to focus on is users' interaction with the web, through their web browsers.
Although some interest has been shown in various disciplines, much research into online information-seeking behaviour still needs to be conducted. As Hider (2005) and Hargittai (2004) both note, information scientists have tended to address the more 'basic' questions of user-system interaction, and have not yet progressed substantially from the earlier transaction log analyses (TLAs) that utilised fairly rudimentary logs offered by individual information systems. For instance, Jansen and others (Jansen and Spink 2000, Jansen, Spink and Saracevic 2000) employ logs from the Excite search engine to answer such questions as how many documents users examine, the frequency withy which Boolean and advanced syntax is used, the proportion of incorrectly formed queries, the proportion of queries which are modified, the number of terms included in a query. A primary reason for this lack of analytical depth may be the lack of detail available in the data collected (Hider 2005). User-system interaction needs to be recorded in much more detail in order for researchers to undertake analyses of online information-seeking behaviour that can be compared to analyses of human behaviour in other areas and environments. A new generation of transaction loggers promises to provide this level of detail, namely, screen recorders.
Using screen recorders for online IS research
For many online activities, including many information-seeking activities, user input is mostly represented on the computer screen, so a screen recorder which captures all screen movement on a continuous basis is able to reveal a good deal about the output and input sides of user-system transaction, independently of any particular information system. This does not mean that screen recorders can capture everything: they do not necessarily capture audio input, for example, and they cannot by themselves capture user behaviour outside of the actual transactions, such as users' physical reactions to particular feedback. Nevertheless, the recorders provide enough data, potentially, for much deeper analyses than found in previous TLA research, allowing for interpretation of users' thought processes that can be supplemented by other methodologies, such as external videotaping, interviews, and protocol analysis.
There are three main ways of recording the computer screen: by external video camera, by internal screen-recording software, and by linking video output to a VCR. The latter two methods are far less obtrusive, and are likely to yield clearer images. A set-up involving a VCR and format conversion is not always a practical solution, however. Screen-recording software, on the other hand, is becoming much more robust and new products, such as Morae from TechSmith, have recently been designed specifically for the HCI community. This is not to say however that there are no longer any problems working with such software (Hider 2005).
Coding screen recordings
The richness of the data captured by the screen recorders can only be taken advantage of by coding the data in a consistent and detailed manner for in-depth analysis. Observation of online information seeking as represented on the screen logs needs to be recorded systematically, as it does for any ethnological study. The quality of the coding determines the quality of the analysis. This article proposes a coding system for use in analysing online information-seeking behaviour and presents a brief case study featuring its application. It is by no means claimed that the scheme is ideal for screen-recording analyses of online information seeking; rather, it is put forward as an example of how a scheme is able, and needs, to record a wide range of possible online behaviour, according to the specific objectives of the researcher.
Hargittai (2004) presents a coding scheme for online actions of information-seeking behaviour, grounded in the use of web browsers. As one of the first coding schemes to be published in this emerging research area, it is an important contribution to IS research. Hargittai publishes her code for the benefit of the research community at large, attempting to make it as comprehensive as possible. For instance, it is based on actions performed on a range of browsers, and on both PCs and Macs. It is also based on actions performed by a cross-section of the public, with different levels of information and ICT literacy. The scheme can be used in the context of an HCI laboratory, but is also meant for analyses of naturalistic online behaviour. The more standardised coding schemes can become, not only internally but also across different research projects, the more readily can the data be used for comparison. It might also make the task of coding easier, at least for as long as the task is a manual one.
My interest in developing a coding scheme preceded my awareness of Hargittai's paper and followed on from devising a scheme for an analysis of transaction logs from OCLC's FirstSearch service (Hider 2004). In that study, each logged query was assigned a code in the context of the session of which it was part, according to its conceptual differences or similarities, as interpreted by the coder, to previous queries. Twelve possible codes were defined by detailed rules, which were followed throughout the coding of approximately two thousand sessions, representing many thousands of queries. Since the coding was based primarily on the semantic content of the queries, rather than on the syntax, the TLA went further than most of its predecessors. Following on from this research, I have coordinated a new project, using Camtasia screen-logging software from TechSmith to record user-system interactions. As part of the piloting of this project, a coding scheme has been developed which is designed to cover most significant (from an information-seeking/retrieval perspective) 'events' captured in the recordings.
Several aspects of this coding scheme need to be evaluated in order for it to be considered successful. Success is defined here in terms of the scheme's effectiveness in describing those elements of online information-seeking behaviour that might be worth studying by information scientists: in terms of its reliability, such that different coders end up with the same set of codes; in terms of its manipulability, such that its product can be readily used for statistical and other forms of analysis and in terms of its ease-of-use, such that coders find it easy to use, and not overly time-consuming.
Developing the coding scheme
The coding scheme was developed through observation of several hours' worth of screen recordings made during the pilot phase of the ongoing research project. These recordings captured online activities at PCs set aside for database searching and academic internet research at an academic library. The recordings were observed from an information retrieval viewpoint, although not with any one particular question in mind. The aim was for data that might be of interest and relevance in studying the way users interacted with the PC for the purposes of retrieving online information, to be included in the 'transcription'. Activities considered to have little or no bearing on information retrieval were not covered. For example, a recording of a user who disregarded the designated use of the PC and proceeded to write an assignment, without any reference to online information, was not coded. If, however, the user who was writing an assignment simultaneously searched the web for information resources, then the activities involved would be coded.
While watching screen recordings of real-life online activity, it does not take long to realise that there are huge numbers of possible variables that might be of relevance to an information science researcher. Thus, while we might consider that the more comprehensive a coding scheme can be the better, such a position needs to be balanced by consideration of practical issues - each research project will be limited to particular topics and applying the most comprehensive of schemes would mean many hours of work for human coders if anything but the most meagre amount of sample 'footage' was collected. This coding scheme, therefore, did not aim at taking account of everything of potential relevance, but at providing a framework that could be extended for application to other research projects. For instance, no account is taken of the position on the screen, or on the web page, of a hyperlink beyond that which is generated by a search engine. Nevertheless, it can be imagined how an additional set of codes might be added to record such a variable.
Unlike the data found in a traditional transaction log, this data had a temporal dimension. It could be viewed as a real-time 'movie' and was better viewed as such, because this helps contextualise the data and facilitates its interpretation (since the user would have also viewed the screen and interacted with the PC in real time). When viewing the recording in real time, it would be impossible for a human coder to note down much detail without the use of the pause button. In fact, 'rewinding' was often necessary, as several items of interest on the screen would appear and disappear in a matter of seconds. Some of the users interacted with the contents of the screen very quickly (apparently more quickly than most of those in Hargittai's study) and they were, of course, likely to know, more than the coder, what they were looking for on the screen, and what they were about to do. The more coding that can be done without recourse to rewinding the recording, the better, given that rewinding is time-consuming and often frustrating. Coding can be done more rapidly by making the codes quick to write down (for example, by using abbreviations) and by making the codes easy to remember, so that the coder does not need to keep checking the documentation. Although Hargittai's codes (2004), which consist of two or three digits, are certainly easy to write down, they are less easy to remember: the 130 numbers bear no relationship to the natural language terms for the actions.
My approach was to develop a system of coding which could be applied in a way similar to that which other systems of transcription, such as the International Phonetic Alphabet, can be used, with skilled practitioners being able to transcribe online actions in a kind of 'shorthand'. It would be easier to write and to read the transcription if the coding was based on everyday language and signs associated with the various online actions. Of course, there are many everyday languages, and this coding system is based only on English, but a more neutral scheme, such as Hargittai's numerical notation, could be adopted as a 'switching' system, where necessary.
When coding an online session, representing a piece of continuous interaction between user and computer, the first thing one need consider is the basic 'units' to be described. One could argue that the data breaks down into the different screens, given that ultimately there are a finite number of frames captured. However, this represents output only, whereas the study of online information seeking is the study of the user's interaction with the system. While we are interested in the content of the screens we are also very much interested in the users' actions - their input. Indeed, we are only interested in the screen's content in relation to the user's actions or reactions. Given that ultimately the content of the screen depends on the user - the user can get up and leave, or pull the plug - it seems reasonable to start with the actions and accompany codes for the actions with notes on the content of the screens. Since screen content can only have a bearing on subsequent actions on the part of the user, information about output will be noted alongside the action that produced the output, before the code for any subsequent action. It is, of course, also easier for the coder to record in this chronological manner.
Therefore, we need to identify important actions that the information seeker might take, and important aspects of visual output (for the information seeker). Actions cannot simply be derived from a single frame of a recording; rather, they need to be interpreted by the coder as he or she watches a sequence of frames. For example, a mouse is shown moving to the 'search' button. It changes into a 'hand' and then the hypertext beneath it changes colour; then a moment later, the screen is redrawn and 'results' come up. It could be that the server was pre-programmed to do just this, but the likely explanation in most contexts is that a user has just asked a search engine to perform a search, and, most likely, intentionally. The coder should be allowed to make use of the context of the previous part of the recording when interpreting the event. If they decide that it was indeed a request to enter a particular query, then they can assign the corresponding action code. Because ultimately this code is based on an interpretation of the user's intention, it represents a construct, and so what we need to do when we devise a set of action codes, is consider which constructs are most likely to be the subject of analyses of online information-seeking behaviour.
With respect to the codification of aspects of output, we again have to select important variables on the basis of their possible effect on users' actions. There are so many that we could account for - text position, size and style, whether it is hypertext, colours, images, menus, style and location of menus, time spent on screen redraws, how the screen is withdrawn - but probably many of these variables would only be investigated in a small minority of analyses. Only some of the major aspects will be covered in the 'core' scheme presented here.
Finally, we should note, of course, that time is a critical dimension of most user-system interaction, and that deeper information-seeking analyses would take account of the chronology of the session. A major criticism of earlier TLAs has been that this dimension is frequently neglected, due quite often to the summary nature of the data.
Details of coding scheme
A list of key actions and output features was identified. The final list, along with their respective codes, is shown below. It should be noted that some of the actions do not, or do not always, represent actions of information retrieval itself. However, they are actions that have been shown in the screen recordings to accompany and relate to information-seeking activity, at least sometimes.
Each line of code should represent one or more actions; codes are entered chronologically, left to right and, line-by-line.
Actions
| The following codes to be entered on a separate line: |
| |
Start |
start of session, indicated by start-up, log on, synchronization with timer, start of recording, etc. |
| |
End |
end of session, indicated by shut-down, log off, synchronization with timer, end of recording, etc. |
| |
Pause |
substantial pause in session, where no screen movement shown |
| |
+ |
opens application/new window (to be used with one of the following codes) |
| |
WB |
web browser |
| |
WP |
word processor |
| |
EM |
e-mail manager |
| |
CM |
citation manager |
| |
PP |
Powerpoint |
| |
÷ |
tiles windows |
| |
δ |
closes some of screen's windows, when tiled/reduced |
| |
% |
scrolls down/up in a browsing manner |
| |
do |
downloads records/resources to disk |
| |
p/f |
pastes into file |
| |
em |
e-mails records/resources |
| |
pr |
prints records/recourses |
| |
#n |
displays full record/resource of n (n = number on list) |
| |
Word |
performs word processing |
| |
E-mail |
reads e-mails/sends emails (except em above) |
| |
Chat |
chats in real time |
| The following codes to be entered as strings on a single line, where applicable: |
| |
b |
goes back to previous screen in window (eg using browser's Back button) |
| |
f |
goes forward to next screen in window (eg using browser's Forward button) |
| The following codes to stand alongside a description of resulting screen's contents: |
| |
Ω |
closes the screen's window |
| |
=> |
switches to another window, or fully opens a window when tiled/reduced |
| |
(abc) → |
clicks on link (abc = link's label) |
| |
p/q |
pastes as query |
| |
p/u |
pastes as URL |
| |
rc |
right-click |
| The following codes to stand alongside a description of target content: |
| |
x |
checks |
| |
ux |
unchecks a preselected option |
| |
c |
copies |
| |
q |
inputs query into IR system |
| |
~ |
drags |
| |
u |
uploads file(s) to application |
| |
>> |
points browser to a new web address |
| Screen content |
| Description of resulting screen's content may include: |
| |
title of webpage |
| |
name of search engine/database |
| |
number of results obtained, eg 0, 7, 169. |
| |
citation numbers displayed, eg 1-10, 11-20. |
| Codes: |
| |
DT |
desktop |
| |
HP |
browser's home page |
| |
X |
broken link |
| |
&& |
screen redraw interrupted by another action |
Case study
Cross-database searching provision
The system was designed as a basis for analysing recorded online information seeking across a range of circumstances. The preliminary data collection was screen recordings made at an academic library and recordings provided by a volunteer using the library's services remotely for literature searches on her own laptop computer. The 'transcription' of those involving the SFX and/or MetaLib technology provided by the library was examined and several observations concerning implementation were made.
SFX is a context-sensitive linking system intended to integrate the databases and other online services and resources that a library has to offer, thereby increasing their effectiveness. It is a product based on the OpenURL standard and is produced by exLibris, a library system vendor. The library in the case study procured SFX and its sister product, MetaLib, as a package that it could customise for its particular service context. The library's implementation of SFX provides the user with two linking options: a possible link to the full text of a cited article in another subscribed database, and a possible link to the full text of a cited article in a serial to which the library subscribes individually, in either printed or electronic form, or both. There are many other possible types of link which could be implemented using SFX and OpenURL technology (Collins and Ferguson 2002, Grogg and Ferguson 2003, 2004), but these are the basic ones.
Early implementations of the new technology have been reported in very positive light (Grogg 2003, Lewis 2003, Stubbings 2003). Use of databases and other resources linked through SFX increases, while the functionality is said to be popular amongst end-users. However, deeper analyses of how the technology is being used, and whether its success is uniform across different interfaces, types of user, and so on, have not yet been forthcoming. Some transaction data can be collected from the SFX statistics module (Cummings and Johnson 2003), but given the potential complexity of SFX use, this data cannot possibly inform us of all aspects of linking activity.
Similar success stories have been reported by implementers of MetaLib and other federated search products. MetaLib, also from exLibris, facilitates the simultaneous interrogation of independent databases through a simple search interface. By linking together its online resources, both MetaLib and SFX can make a library's online services more visible, an objective which has become critical given the rise of Google. Implementing this technology is no simple matter, however. Each database needs to be configured individually and its linking thoroughly tested. It is also important for users to be made aware that not all of a library's resources are necessarily covered by these tools.
Coded session
An example of a transcription using the prototype coding scheme is given in Appendix 1, constituting a session which involves SFX functionality.
Observations
Given that OpenURL and federated search technology have been widely heralded as the great leap forward for online library provision, the way in which the user interacts with this technology is a prime candidate for a screen log analysis. From the small sample of recordings featuring its use in this preliminary study, it was observed that there are reasons why the implementation of this technology needs to be both extensive and accurate if it is to enhance online information seeking. The transcription (Appendix 1) shows how SFX is not always so helpful; if a larger sample were to show that this search session typifies the use of SFX, then the cost-effectiveness of the tool would need to be questioned. From the coding, we can summarise the searches carried out in this session as follows:
PsycINFO
- 0 hits
- 5 hits, 1 citation opened, 1 document saved
- 3 hits, 1 citation opened, full-text retrieval attempted
ProQuest
- 17 hits, 7 citations opened, 7 documents saved
- 95 hits, 4 citations opened, 3 documents saved
- 0 hits
- 55 hits, 6 citations opened, 4 documents saved, for other two full-text retrieval attempted
- 0 hits
- 1091 761 hits
- 15 hits, 4 citations opened, 4 documents saved
- 0 hits
- 7 hits, 3 citations opened, 3 documents saved
Google
- 1 280 000 hits, 1 citation opened, 1 document saved
- 9180 hits, 0 opened/saved
- 25 100 hits, 0 opened/saved
- 32 000 hits, 0 opened/saved
Examining the transcription, we can see that the SFX option was tried on three citations: results numbered 8 and 12 of a ProQuest query, and number 2 of a PsycINFO query. In the first case, full text was not found in any database, and the library catalogue option was taken up. A query for the relevant journal (based on the citation) was automatically entered into the catalogue, and a hit found; unfortunately, this hit led to the e-copy of the journal represented on the ProQuest database. In the second case, full text for two items was found, the first link pointing again to the ProQuest database, which led to the previous citation (with no full text). The user then tried the second link, to another database (Factiva) and was presented with a blank search form to fill in manually. Unfortunately, we observed that he/she was unable to perform a sufficiently accurate search to locate the document - the first search contained a misspelling, the second was too broad. In the third case, the SFX option did not find any full text, and the user declined the library catalogue option.
The transcription tells us more than the summary statistics of three attempts and three 'failures'. For instance, it shows us how the user twice followed up on an option that could not lead to the full text, when he/she opened up the ProQuest database again. In the first case, the issue needs to be addressed in bibliographic instruction: the catalogue option should be followed up only if it reveals a hard copy of the journal or an e-copy belonging to a non-SFX database - an e-copy pointing to the same database as offered by the SFX option is not going to help. In the second case, the issue is one of implementation: hits should only lead to full text, and not to citations without full text as well. This is a serious problem, since a citation-only hit will otherwise always be present, when the SFX starting point is a citation with no full text. It is inevitable that on some, if not many, occasions this will mislead the user (as well as waste their time) and reduce their confidence in the ability of the SFX option to deliver.
The second SFX case reveals another problem. When the user does select another database that promises a full-text version (whether it is so is another matter), he/she is faced with a blank search interface and required to input a query for the article manually. There are actually two problems with this: first, it is the native interface with which the user may not be familiar; second, the user may well be unable to perform an effective query without careful reference back to the citation - as was the case here. Again, this may lead not only to a frustrating 'miss', and waste users' time, but also reduce their confidence in the reliability of the SFX technology. Such pessimism may be gauged by the rate at which the SFX option is taken up: it would be interesting to see whether such 'misses' affected this rate. Even in this single transcription, we see a possible indicator of growing pessimism in the catalogue option, when it was not taken up in the third case.
Only after two searches was the SFX option tried; but there was a good reason for this, namely, that only after these was it available. As we can see from the details of the queries, this was because in most cases, the user set the condition of 'full text' as a query parameter. Indeed, the only time when this was not the case, apart from when Google was used, was when the user followed up the results of a query by using the 'suggested topics' option, which performed another query automatically without the full-text option set. The other query that allowed for the SFX option was when the PsycINFO search engine offered an additional result set, without the full-text parameter.
There are two other points to make about this SFX implementation, which are brought out in the transcription. First, in both the ProQuest cases, the user later 'toggles' back to the results screen of the 'find a copy' option, even after unsuccessful SFX efforts. This might indicate not only a large interest in finding the full text, but an uncertainty about whether or not the option has been fully utilised. It would be interesting to determine in what situations such toggling most frequently occurs, perhaps with the aid of post-session interviews or protocol analysis. Second, we can see in the transcription how the SFX option is in fact a two-step process here: first, the user has to click on the 'find a copy' option under the citation; then on the next screen, they need to click on 'SFX'. It is by no means clear to the user that this second step is necessary, especially given its use of the jargon term, 'SFX'.
Perhaps the most important point to be made, however, concerns not so much the problems that this SFX implementation needs to address, but of its objective, limited as it is to providing more full-text options. What has so far been found is that users' search behaviour may frequently make this kind of option less relevant. That is, in this sample of sessions, the user would often either choose a database which provided only full text, or would set full text as a parameter for his/her searches. We noted that it was only really by 'accident' that the user in the session analysed in Appendix 1 came to use the SFX option. Two courses of action come to mind: first, users might be encouraged to drop their full-text precondition, given an effective SFX option; second, this implementation of SFX could be supplemented with types of linking options other than for full text, such as interlibrary loan, article and author citations, affiliations, etc. As linking enthusiasts have pointed out, the potential of linking technology goes a long way beyond full text (Soderdahl 2003).
As well as the SFX option linking from a particular database, the library offered two generic search engines based on the exLibris suite: CitationLinker and SmartSearch (ie MetaLib). CitationLinker is used to trace full text across all the SFX-compliant databases and library catalogue (in much the same way as does the SFX option from a particular database). Transcriptions showed a certain perseverance with the tool on the part of the user, though not necessarily impressive results: more often than not, no full text was found.
SmartSearch is the library's name for its MetaLib tool, which aims to provide a fully fledged federated search. Sessions involving SmartSearch, however, were few and far between - probably because it had only recently been launched and was still in pilot mode. The importance of effective database groupings - searches were limited to a maximum of six targets, thus necessitating such groupings - was demonstrated by some unhelpful results. In one session, for example, the two resulting hits were 0/22/NA and 0/51/4461 - only one database had provided helpful results. Moreover, the same search topic was the subject of another session by the same user, where the individual database had been used to much greater effect. In that session, the user had changed the display setting, an action not accommodated for in the SmartSearch facility. This user is by no means the only one to employ 'advanced search' techniques, and a larger sample will reveal the extent to which users might miss these options in the federated search interface - perhaps more so than has previously been assumed.
Discussion
The kind of analysis outlined in the case study above can only be undertaken using a detailed coding system, consistently applied, of user-system interaction. In larger samples, online behavioural phenomena can be identified using algorithms based on sequences of codes. It is very possible that more recordings will reveal certain actions and information-seeking situations which have not been covered by the coding system detailed in this paper, but all the sessions in the preliminary sample were successfully 'transcribed', such that many aspects of online information seeking were recorded for analysis. Many analyses concerning such things as searching styles and interface problems can be undertaken using the coding scheme and extensions of it.
Hargittai (2004) claimed comprehensiveness for her scheme, but this would appear to be something of an overstatement. Even in the prototype scheme developed here we may identify several aspects of online action not covered by Hargittai. First and foremost, Hargittai's scheme covers only browser-related actions, yet a good deal of real-life online information seeking involves other applications, such as word processors and e-mail managers, and actions that transcend the browser, such as downloading files from the internet, uploading files, printing files and tiling windows. Some of these actions, such as printing and downloading, may provide crucial clues to the progress of an information seeker. Among other common actions not in her scheme are copying and pasting, checking boxes, and dragging.
Hargittai's scheme is not altogether action-based. Many of the codes relate to very particular content, such as a specific search engine (such as AOL, Google, Yahoo). Given the dynamic nature of online content, it would seem better to avoid codifying such aspects at the generic level. An alternative would be to allow space in the scheme for local codes, according to the objectives of individual research projects.
Although Hargittai also aims to classify information-seeking behaviour, her scheme appears to cover online actions more generally and there are several instances where technical differences of action are identified in Hargittai's scheme, but glossed over in mine. For example, Hargittai has codes for the use of the Refresh and Stop buttons. Hargittai also has several codes for different ways in which the browser is pointed to a new URL address, such as the location bar, favourites, history file; my prototype has only one code. Some of these distinctions may not be especially critical for analyses of information seeking. Hargittai's codes for various scrolling movements might not tell us much more than a single code, given that scrolling action is very much dependent on the position of the various content elements. Other codes in Hargittai's scheme, however, may be adopted in due course, such as the distinction between (the use of) offsite and in-site links. I have ignored other actions coded by Hargittai, because of my focus on information-seeking in and around online systems, rather than on that of a more incidental nature.
Earlier in this paper, a coding system's success was defined according to these four criteria.
- Effectiveness in describing those elements of online information-seeking behaviour that might be worth studying by information scientists.
- Reliability, such that different coders end up with the same set of codes
- Manipulability, such that its product can be readily used for statistical and other forms of analysis.
- Ease-of-use, such that coders find it easy to use, and not overly time-consuming.
With respect to effectiveness (or validity), both Hargittai's and my scheme provide data for worthwhile analyses of online information seeking. It is perhaps too early to assess the extent to which either of the schemes might be able to cover the range of variables of interest to researchers, and in any case the scheme detailed in this paper is presented as a prototype only. Moreover, it has been designed to accommodate new codes as new actions are identified for analysis, and also to allow for supplementary description when the scope of a particular research project requires it. What the scheme proposes is a core set of codes for core actions and content on which the researcher may build. With respect to reliability, this prototype scheme needs further testing, but a preliminary study involving the parallel coding of several recordings suggests that it can be used consistently by different coders. Few differences in the lines of transcription were found. The openness of the scheme allows details of transcription to vary according to research objectives, but not to vary very significantly due to errors or differences of interpretation.
With respect to manipulability, Hargittai's scheme can be readily analysed using statistical software, whereas transcriptions based on the scheme proposed here require additional processing. In theory, the symbols and special characters used in my scheme need be no more problematic for the computer than the numerical notation used in Hargittai's scheme. However, my prototype also separates description of actions and content across two dimensions, making the design of reliable algorithms a much more difficult task. Both schemes allow for chronological analyses through mathematical studies of code sequences, but it should be borne in mind that such studies are likely to be extremely complex given the options available to end-users.
A strength of the scheme proposed is its 'coder-friendliness'. The number of codes is manageable and relatively easy to remember. After a few coding sessions, it was possible to gain a fluency so that there was little need to refer to the scheme's documentation, nor to rewind the recording constantly. When the coder is able to achieve such fluency, the technique becomes akin to other forms of transcription, and screen log analysis becomes a methodology similar to other forms of systematic observation. It is hoped that many more screen recordings will be made and coded by ethnographers of online information seeking.
The author thanks Rachel Salmond for her assistance, particularly with respect to the recording and coding discussed in this paper.
References
Collins, M D D and Ferguson, C L (2002) 'Context-sensitive linking: it's a small world after all' Serials Review 28(4): 267-282.
Cummings, J and Johnson, R (2003) 'The use and usability of SFX: context-sensitive reference linking' Library Hi Tech 21(1): 70-84.
Grogg, J E (2003) 'Linking e-journals to databases: full text linking practices' Serials Librarian 45(2): 145-151.
Grogg, J E and Ferguson, C L (2003) 'Linking services unleashed' Searcher 11(2): 26-31.
Grogg, J E and Ferguson C L (2004) 'Oh, the places linking will go! a state of the union report' Searcher 12(2): 48-58.
Hargittai, E (2004) 'Classifying and coding online actions' Social Science Computer Review 22(2): 210-227.
Hider, P (2004) 'User redefinition of search goals through interaction with an information system.' PhD thesis. City University, London.
Hider, P (2005) 'A new generation of transaction logging systems: a new era of transaction log analysis?' Information Online: 12th Exhibition and Conference: Proceedings, 1-3 February, http://conferences.alia.org.au/online2005/papers/c7.pdf (viewed 17 February 2005).
Jansen, B J and Spink, A (2000) 'Methodological approach in discovering user search patterns through Web log analysis' Bulletin of the American Society for Information Science 27(1): 15-17.
Jansen, B J, Spink, A and Saracevic, T (2000) 'Real life, real users, and real needs: a study and analysis of user queries on the web' Information Processing and Management 36: 207-227.
Lewis, N (2003) '"I want it all and I want it now!": managing expectations with MetaLib and SFX at the University of East Anglia' Serials 16(1): 89-95.
Soderdahl, P A (2003) 'Implementing the SFX link server at the University of Iowa' Information Technology and Libraries 22(3): 117-119.
Stubbings, R (2003) 'MetaLib and SFX at Loughborough University Library' VINE 33(1): 25-32.
Biographical information
Philip Hider joined the School of Information Studies at Charles Sturt University after extensive experience as a librarian and a lecturer in the United Kingdom and Singapore. He has recently completed his PhD thesis in information retrieval at City University, London. He is a member of the Australian Committee on Cataloguing.
Appendix 1
An information search and retrieval session coded using the prototype system.
| Start |
| (PsycINFO)→ |
|
|
|
| q |
remote therapy |
0 |
|
| => |
databases (HP) |
|
|
| (PsycINFO)→ |
|
|
|
| q |
distance therapy |
5 |
1-5 |
| #1 |
|
|
|
| % |
|
|
|
| rc |
properties |
|
|
| c |
file's URL |
|
|
| +WB |
|
|
|
| p/u |
|
|
|
| >> |
X |
|
|
| => |
1 |
|
|
| em |
|
|
|
| => |
psychology databases |
|
|
| (ProQuest)→ |
|
|
|
| q |
face-to-face therapy |
17 |
1-10 |
| #1 |
|
|
|
| em |
|
|
|
| b |
|
|
|
| #2 |
|
|
|
| % |
|
|
|
| em |
|
|
|
| b |
|
|
|
| #5 |
|
|
|
| em |
|
|
|
| % |
|
|
|
| (link in article)→ |
website |
|
|
| => |
|
1-10 |
|
| #9 |
|
|
|
| em |
|
|
|
| b |
|
|
|
| #10 |
|
|
|
| em |
|
|
|
| => |
website |
|
|
| Ω |
10 |
|
|
| => |
|
|
|
| 1-10 |
=> |
|
|
| 11-20 |
#12 |
|
|
| em |
|
|
|
| #16 |
|
|
|
| em |
|
|
|
| => |
psychology databases |
|
|
| % |
|
|
|
| (American Research Library)→ |
|
|
|
| (ProQuest)→ |
|
|
|
| q |
telephone counselling (full-text only) |
95 |
1-10 |
| #7 |
|
|
|
| em |
|
|
|
| b |
|
|
|
| #8 |
|
|
|
| em |
|
|
|
| % |
|
|
|
| b |
|
|
|
| => |
|
|
11-20 |
| #16 |
(abstract, though has full text) |
|
|
| b |
|
|
|
| #19 |
|
|
|
| % |
|
|
|
| em |
|
|
|
| bb |
|
|
|
| q |
internet counselling and psychotherapy (scholarly/f-t) |
0 |
|
| q |
counselling and internet (suggested topics) |
55 |
1-10 |
| #1 |
|
|
|
| % |
|
|
|
| em |
|
|
|
| b |
|
|
|
| ('find a copy' for #8)→g |
&& |
|
|
| #7 |
|
|
|
| em |
|
|
|
| % |
|
|
|
| b |
|
|
|
| => |
'find a copy' result for 8 (abstract only) |
|
|
| (SFX)→ |
|
0 full-text |
|
| (catalogue search)→ |
|
|
|
| q |
[journal] |
1 (e-version) |
|
| (ProQuest)→ |
|
|
|
| => |
|
1 (e-version) |
|
| => |
'find a copy' result for 8 (abstract only) |
|
|
| em |
(abstract) |
|
|
| % |
|
|
|
| b |
(display results) |
1-10 |
|
| #10 |
|
|
|
| % |
|
|
|
| em |
|
|
|
| b |
|
|
|
| => |
|
|
|
| 11-20 |
|
|
|
| => |
'find a copy' result for 8 (abstract only) |
|
|
| => |
|
|
|
| 11-20 ('find a copy' for #12)→ |
(abstract only) |
|
|
| (SFX)→ |
|
|
|
| 2 full-text (ProQuest)→ |
|
|
|
| q |
[citation automatically entered] |
|
1 (abstract) |
| b |
|
|
2 full-text |
| => |
Factiva (blank search interface) |
|
|
| q |
telephone counselling |
|
0 |
| q |
therapy |
1 091 761 |
1-10 |
| => |
|
|
11-20 |
| #16 |
|
|
|
| % |
|
|
|
| em |
|
|
|
| => |
'find a copy' result for 12 (abstract only) |
|
|
| => |
|
|
11-20 |
| b |
|
|
|
| q |
counselling and internet (scholarly journals,full-text) 15 |
|
1-10 |
| => |
|
|
11-15 |
| #11 |
(PDF) |
|
|
| do |
|
|
|
| b |
|
|
|
| #13 |
|
|
|
| c |
title of article |
|
|
| do |
|
|
|
| b |
|
|
|
| #15 |
|
|
|
| do |
|
|
|
| bb |
|
|
1-10 |
| #4 |
(PDF) && |
|
|
| q |
counselling and internet |
|
0 |
| => |
4 |
|
|
| do |
|
|
|
| => |
display results (0) |
|
|
| q |
telephone counseling (scholarly/f-t) |
7 |
1-7 |
| #1 |
|
|
|
| do |
|
|
|
| b |
|
|
|
| #5 |
|
|
|
| do |
|
|
|
| b |
|
|
|
| #7 |
|
|
|
| do |
|
|
|
| => |
psychology databases |
|
|
| (PsycINFO)→ |
|
|
|
| q |
internet counselling |
3/0 (limiters) |
|
| => |
|
|
1-3 |
| (SFX #2)→ |
|
0 full-text |
|
| Ω |
databases (HP) |
|
|
| >> |
Google |
|
|
| q |
internet counselling |
1 280 000 |
1-10 |
| #4 |
|
|
|
| % |
|
|
|
| c |
webpage (text) |
|
|
| +WP |
|
|
|
| p/f |
|
|
|
| do |
|
|
|
| => |
|
|
|
| q |
internet counselling blue Canberra |
9180 |
1-10 |
| q |
internet counselling blue Canberra gym |
25 100 |
1-10 |
| q |
internet depression blue gym |
32 000 |
1-10 |
| End |
 |
|