Inter-Linker Consistency in the Manual Construction of Hypertext Documents

Jonathan Furner

University of California, Los Angeles Web: http://www.ucla.edu/
Department of Information Studies Web: http://dlis.gseis.ucla.edu/
405 Hilgard Avenue, Los Angeles, CA 90095-1520, USA
Email: jfurner@ucla.edu
Web: http://scow.gseis.ucla.edu/faculty/jfurner/HTML/jfurner.html

David Ellis, Peter Willett

University of Sheffield Web: http://www.shef.ac.uk/
Department of Information Studies Web: http://www.shef.ac.uk/uni/academic/I-M/is/home.html
Western Bank, Sheffield, S10 2TN, England
Email: d.ellis@sheffield.ac.uk, p.willett@sheffield.ac.uk
Web: http://www.shef.ac.uk/uni/academic/I-M/is/people/ellis.html, http://www.shef.ac.uk/uni/academic/I-M/is/people/willett.html

Abstract: The creation of hypertext links by manual means is difficult and costly, but is often preferred to automatic methods in attempts to optimize hypertext `quality'. Experiments show, however, that levels of consistency amongst human link-creators---in common with those historically observed amongst human indexers---are generally low and variable, and that the effectiveness of retrieval from manually-constructed hypertexts can be expected to suffer as a result. This finding motivates further comparison both of manual and of automatic methods of link creation.
Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing - Indexing methods
H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia - Architectures
I.7.2 [Document and Text Processing]: Document Preparation - Hypertext/Hypermedia
General Terms: Experimentation, Performance, Design
Additional Key Words and Phrases: Graph theory, inter-indexer consistency, link creation, similarity, topological indices

1 Introduction

The creation of the intra-document links between the individual components of a hypertext document is a difficult and time-consuming task [Westland 1991], but one in which human intervention has commonly been thought necessary if the semantic relationships that exist between the components of the document are to be made explicit. A similar view has prevailed for many years with regard to the indexing of documents in information retrieval (IR) systems, where the existence of well-established systems for automatic indexing [Salton 1989] has not prevented the widespread use of trained library and information specialists for indexing and classifying documents prior to their incorporation in an online database [Lancaster 1986].

The importance of the indexing task in IR has led to much interest in studies of inter-indexer consistency---i.e., of the extent to which agreement exists among different indexers on the sets of index terms to be assigned to individual documents. These studies have consistently concluded that recorded levels of consistency vary markedly, and that high levels of consistency are rarely achieved [Leonard 1977]. Similar levels of inconsistency are routinely observed in the manual execution of related cognitive tasks [Saracevic 1991], such as the selection of terms to use as names for concepts or objects [Furnas 1987], the formulation of queries for searching document databases [Bates 1986], and the estimation of relevance of the documents retrieved in such a search [Lesk 1969]. The insertion of links in hypertext documents may be viewed as being analogous to the assignment of index terms to such documents [Liebscher 1994]: the present paper summarizes and discusses the main results of a study [Furner 1996] that drew this analogy in seeking to determine the extent to which different people produced similar link structures for the same hypertext documents.

2 Experimental Design

The hypertext documents used in our experiments derived from five printed full-text documents, each a thesis, journal article or book written by a member of the Department of Information Studies at the University of Sheffield. The number of paragraphs in each document ranged from 23 to 347. Five copies were made of each of the chosen documents, and each of the twenty-five copies was allocated to a different student volunteer from the Department. The volunteers were instructed in the use of an interactive system, developed using the authoring system Guide, that allowed them to create explicit representations of links between paragraphs whose contents they decided were related. On completion of the linkers' work, the results were five hypertext versions of each of five different source documents, each sharing a common set of nodes (i.e., paragraphs) with four others, but each having a different set of links inserted amongst the nodes. Inter-linker consistency was then investigated by calculating measures of the similarities between hypertexts generated from the same source document.

Most measures of similarity are designed for use with standard data matrices, in which each row represents one of the objects in a dataset and each column represents one of the variables used to characterise these objects. In order that our hypertexts might be represented in matrical form, it was necessary to adopt some of the graph-theoretic techniques that have been similarly applied in measurements of the similarities between molecules in computer-based chemical information systems [Willett 1987]. In these latter studies, a molecule is represented by a graph in which the vertices and edges stand for the atoms and bonds, respectively, of that molecule, these corresponding to the nodes and inter-nodal links, respectively, in our hypertext documents. It is possible to produce several derived representations from such graphs, including adjacency matrices, distance matrices and converted distance matrices. The adjacency matrix for a p-vertex graph is a p × p element matrix in which the ijth element is set to one or zero depending on whether vertices i and j are, or are not, linked. The value of the ijth element of a distance matrix is the number of edges in the shortest path linking these two vertices (or a null element if the two nodes are not connected). A converted distance matrix is a related representation described by Botafogo et al. [Botafogo 1992].

In the comparison of directed graphs, the elements of such matrices can be used in several different ways. Any matrix may itself be represented (i) as a single n-tuple of elements (where n= p² - p, i.e., discarding those elements on the leading diagonal where i= j), (ii) as a sequence of p vertex-centered n-tuples (where n= p - 1), (iii) by a topological index, a single-valued real number that is obtained by operations on the matrix elements, or (iv) by a sequence of p vertex-centered topological index values. The similarity between two graphs (representative, for example, of two hypertexts derived from the same document) can then be calculated by comparing the two n-tuples, the two sequences of p n-tuples, the two topological index values, or the two sequences of p topological index values.

If each graph is characterized solely by a single-valued descriptor such as a topological index value, then it is trivial to calculate the numerical difference between two such values, and thus to derive a value for a primitive measure of similarity: the smaller the difference, the more similar are the objects. A range of such indices were used in our study [Ellis 1994a], including examples that have been suggested in the literature for the characterization of both chemical [Basak 1987] and hypertext [Botafogo 1992] graphs. Smeaton and Morrissey [Smeaton 1995] and Blustein et al. [Blustein 1997] have since made use of similar methods of quantifying the structural characteristics of hypertexts.

If, however, a multivariate description is used (as in most of our experiments) then a similarity coefficient must be used to quantify the degree of similarity between a pair of matrices. Very many coefficients can be used for this purpose [Ellis 1993], and we used no less than twenty seven in our experiments. However, the results that were obtained were not found to vary significantly from one coefficient to another; indeed, we were able to demonstrate that many of the coefficients were equivalent to, or in some cases monotonic with, each other [Ellis 1994a].

3 Experimental Results

Five hypertext versions were produced of each source document, giving a total of ten possible pairs of sets of hypertext links for that document (i.e., fifty possible pairs for the entire dataset), for each of which similarity values could be calculated using all of the many combinations of matrix representation, descriptor and similarity coefficient noted above. When this was done, it was possible to draw a simple, unequivocal conclusion (unequivocal in that it was obtained across the full range of conditions that were studied): that levels of inter-linker consistency are generally low, but can also be variable. Thus, there is usually (but not always) very little agreement between the sets of links identified by different people. For example, using the well-known Dice coefficient [Salton 1989] with one of the combinations of matrix representation and descriptor, a highly skewed distribution of values was obtained with a median similarity as low as 0.03, i.e., well towards the lower bound of zero for this coefficient. However, the variable nature is illustrated by the fact that a few of the Dice coefficients for one of the documents (a journal article about probabilistic methods for predicting the biological activity of chemical compounds) were as high as 0.50, causing the mean of the distribution to be 0.12.

4 Related Work

In our earlier reports, we indicated our keenness that these experiments should be replicated under variously differing conditions. Green's subsequent evaluation of a novel, thesaurus-based method of generating hypertext links automatically [Green 1998] was motivated by a preliminary study in which he responded precisely to this call. Green compared the link-sets generated, for each of three short, regularly-structured, strictly-edited newspaper articles, by fourteen volunteer subjects. Green observed slightly higher levels of consistency amongst his linkers (suggesting that the characteristics of documents, as much as of the linkers themselves, are indeed influential in this respect), but was led to confirm that the conclusions we had reached earlier `are equally valid for short, well-structured texts'.

5 Discussion

Our conclusion is that different people tend to have very different views of the semantic relationships that exist among the components of a full-text document, and consequently tend to impose different hypertext link structures on the same source documents---just as different people tend, also, to assign different sets of index terms to the same document, to use different terms to represent the same query concept, and to judge different documents as being relevant to the same query. (Studies have also confirmed, in fact, that the same person, carrying out the same labelling task on different occasions, will tend to produce different results.)

One inference that is often drawn from studies of inter-indexer and intra-indexer consistency---which commonly record low and/or variable levels of consistency comparable with those found in our study of linkers---is that they are somehow indicative of indexer/searcher consistency, and ultimately of retrieval effectiveness. If (so the argument runs) there is little likelihood of professional indexers agreeing on the terms that should be selected to represent the content of documents, then what hope can there be that a searcher will be able correctly to predict (and thus use in a query) the terms used by the indexer to represent any document that might be relevant to the searcher?

An analogous inference can be drawn from our hypertext study. If linkers cannot agree on the links to be inserted among the nodes of a hypertext, then what use are those links likely to be in support of the browsing strategies of the reader? In other words, if inter-linker consistency is found to be low, then linker/reader consistency---the level of agreement as to the semantic relationships that exist between the components of the text, and that therefore should explicitly be represented by links that the reader then has the opportunity to follow---can also be expected to be low [Ellis 1996].

Our study thus has somewhat alarming implications for those who might otherwise make bold claims for the effectiveness of any large hypertext that incorporates manually-generated links (such as the World Wide Web) in support of users' information-seeking strategies. Historically, browsing has been seen as a core component of the Web experience. More recently, however, the notion that Web users should need, or indeed would ever want, to rely on other people's divergent decisions as to what they should read next has been called into question, with search engines (`traditional' IR) supplanting link-traversal as the tool of choice for information seekers [Bieber 1998a].

On the other hand, our results contribute to a justification of the numerous and continuing efforts to develop, evaluate, and improve techniques for automatic or semi-automatic link creation [Agosti 1997] [Crestani 1998] [Melucci 1999]. These methods are typically based on traditional IR techniques, such as the representation of documents by vectors of features, and use of the degree of similarity between such vectors as a measure of the strength of the relationship between the documents they represent. Blustein [Blustein 1999], for example, compares the performance of two methods of automatic link creation, one based on Cornell's SMART, and the other on Bellcore's Latent Semantic Indexing. Blustein cites our work on inter-linker consistency as an important motivation for his study: the assumption is `although an automated approach is not likely to find all "correct" links it will at least be consistent and fast' (p. 1). Indeed, one central assumption of this kind of work is that automatic techniques are inherently reproducible and thus highly predictable: given the same set of initial conditions, any automatic link-generation algorithm will always proceed in the same way. The sets of links created by a particular method will thus be produced in a consistent manner to which users might become accustomed as they browse. Thistlewaite [Thistlewaite 1995], in his placement of `consistency' at the head of a list of criteria for assessing the quality of hypertext links, concurs with this latter view.

Blustein [Blustein 1999] suggests that the observation of inconsistency amongst linkers supports the argument that hypertext authoring systems will turn out to be most useful as personal annotation assistants---i.e., as devices that allow readers to become link creators. This idea is at the heart both of Bush's famous conception of the Memex [Bush 1945], and of contemporary `information appliances' such as Xerox's XLibris---an `active reading machine' [Price 1998].

6 Conclusion and Further Work

A long tradition of IR experimentation has demonstrated the high levels of performance that may be attained by systems utilizing automatic indexing techniques. Yet even at this relatively mature stage of development of hypertext systems, few detailed comparative evaluations of automatic and manual link-generation techniques---or, for that matter, of alternative automatic mechanisms---have been carried out, and the field remains open for further research. It is hoped that our study will encourage others to consider such comparisons.

Another recurring theme of studies of consistency in indexing is the assertion that consistency---both inter-indexer and, more helpfully, indexer/searcher---can be improved through the imposition of some form of control over the vocabulary from which terms may be selected for indexing and searching [Tarr 1974]. Many commercial document database systems build inverted files of keywords and phrases extracted mechanically not only from the uncontrolled text of original titles, abstracts or (in full-text systems) documents, but also from controlled index terms, subject headings and classification codes that have been manually assigned by indexers. It would thus be of interest to investigate the effect of the complementary use of controlled-vocabulary indexing (whether manual or automatic) in the generation of node surrogates on the consistency and/or quality of subsequent linking (whether manual or automatic).

Acknowledgments

The experimental work reported in this paper was supported by the British Library Research and Development Department, London, under grant number RDD/G/142. At the time of this award, Jonathan Furner was at the Department of Information Studies, University of Sheffield. We would like to thank the anonymous referees of this paper for their valuable suggestions.

References

[Agosti 1997] Maristella Agosti and James Allan. "Methods and tools for the construction of hypertext" in Information Processing and Management, 33(2), 129-271, 1997.<

[Basak 1987] Subhash C Basak, Gerald J Niemi, R. R. Regal, and Gene D. Veith. "Topological Indices: their Nature, Mutual Relatedness, and Applications" in Mathematical Modelling 8, 300-305, 1987.<

[Bates 1986] Marcia J. Bates. "Subject Access in an Online Catalog" in Journal of the American Society for Information Science 37, 357-376, 1986.<

[Bieber 1998a] Michael Bieber, Catherine C. Marshall, Donald S. Retallack, and Anne-Marie Vercoustre. "Will hypertext become the Web's missing link?" in Computer Networks and ISDN Systems, 30, 754-756, 1998.<

[Blustein 1997] W. James Blustein, Robert E. Webber, and Jean Tague-Sutcliffe. "Methods for Evaluating the Quality of Hypertext Links" in Information Processing and Management, 33, 255-271, 1997.<

[Blustein 1999] W. James Blustein. Hypertext Versions of Journal Articles: Computer-aided Linking and Realistic Human-based Evaluation. Ph. D. thesis, University of Western Ontario, London, Ontario, 1999.<

[Botafogo 1992] Rodrigo A. Botafogo, Ehud Rivlin, and Ben Shneiderman. "Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics" in ACM Transactions on Information Systems, 10(2), 142-180, 1992.<

[Bush 1945] Vannevar Bush. "As We May Think" in The Atlantic Monthly, 176(1),101-108, [Online: http://www.isg.sfu.ca/~duchier/misc/vbush/], July 1945.<

[Crestani 1998] Fabio Crestani and Massimo Melucci. "A case study of automatic authoring: from a textbook to a hyper-textbook" in Data & Knowledge Engineering 27, 1-30, 1998.<

[Ellis 1993] David Ellis, Jonathan Furner-Hines, and Peter Willett. "Measuring the Degree of Similarity between Objects in Text Retrieval Systems" in Perspectives in Information Management 3, 128-149, 1993.<

[Ellis 1994a] David Ellis, Jonathan Furner-Hines, and Peter Willett. "On the Creation of Hypertext Links in Full-Text Documents: Measurement of Inter-Linker Consistency" in Journal of Documentation 50, 67-98, 1994.<

[Ellis 1996] David Ellis, Jonathan Furner, and Peter Willett. "On the Creation of Hypertext Links in Full-Text Documents: Measurement of Retrieval Effectiveness" in Journal of the American Society for Information Science 47, 287-300, 1996.<

[Furnas 1987] George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais. "The Vocabulary Problem in Human-System Communication" in Communications of the ACM (CACM), 30, 964-971, 1987.<

[Furner 1996] Jonathan Furner, David Ellis, and Peter Willett. "The Representation and Comparison of Hypertext Structures using Graphs" in Information Retrieval and Hypertext, Maristella Agosti and Alan F. Smeaton (editors), Kluwer, Norwell, MA, 75-96, 1996.<

[Green 1998] Stephen J. Green. "Automated Link Generation: Can We Do Better than Term Repetition?" in Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia, 75-84, 1998.<

[Lancaster 1986] F. Wilfred Lancaster. Vocabulary control for information retrieval. Information Resources Press, Arlington, VA, 1987.<

[Leonard 1977] Lawrence E. Leonard. Inter-indexer consistency studies, 1954-1975: a review of the literature and summary of study results. Graduate School of Library Science, University of Illinois, Urbana-Champaign, IL, 1977.<

[Lesk 1969] Michael E. Lesk and Gerald Salton. "Relevance Assessments and Retrieval System Evaluation" in Information Storage and Retrieval 4, 343-359, 1969. <

[Liebscher 1994] Peter Liebscher. "Hypertext and Indexing" in Challenges in Indexing Electronic Text and Images, Raya Fidel, Trudi Bellardo Hahn, Edie M. Rasmussen, ad Philip J. Smith (editors,) Learned Information, Medford, NJ, 82-86, 1994.<

[Melucci 1999] Massimo Melucci. "An Evaluation of Automatically Constructed Hypertexts for Information Retrieval" in Information Retrieval 1, 91-114, 1999.<

[Price 1998] Morgan N. Price, Gene Golovchinsky, and William N. Schilit. "Linking by Inking: Trail-Blazing in a Paper-Like Hypertext" in Proceedings of ACM Hypertext '98, Pittsburgh, PA, 30-39, June 1998.<

[Salton 1989] Gerald Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.<

[Saracevic 1991] Tefko Saracevic. "Individual differences in organizing, searching and retrieving information" in Proceedings of ASIS '91, 82-86, 1991.<

[Smeaton 1995] Alan F. Smeaton and Patrick J. Morrissey. "Experiments on the automatic construction of hypertexts from texts" in New Review of Hypermedia and Multimedia 1, 23-39, 1995.<

[Tarr 1974] D. Tarr and Harold Borko. "Factors influencing inter-indexing consistency. in Proceedings of the American Society for Information Science (ASIS) 37th Annual Meeting, 11, 50-55, 1974.<

[Thistlewaite 1995] Paul Thistlewaite. "Automatic construction of open webs using derived link patterns" in IR and Automatic Construction of Hypermedia: A Research Workshop, Maristella Agosti and James Allan (editors), ACM SIGIR, Seattle, WA, 1995.<

[Westland 1991] Christopher J. Westland. "Economic constraints in hypertext" in Journal of the American Society for Information Science (ASIS), 42(3), 178-184, 1991.<

[Willett 1987] Peter Willett. Similarity and clustering in chemical information systems. Research Studies Press, Letchworth, England, 1987.<

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org.