Query expansion via conceptual distance in thesaurus indexed collections

Tudhope D, Binding C, Blocks D, Cunliffe D (2006) Query expansion via conceptual distance in thesaurus indexed collections. Journal of Documentation 62(4), 509–533.

This article looks at using thesauri as a possible mediating interface between collections indexed using controlled vocabulary and may help users who are unfamiliar with the terminology search/browse for resources.  Three main thesaurus relationships can aid in query expansion/reformulation: equivalence (synonyms), hierarchical (broader/narrower), and associative (see also).  They also conducted user studies with the FACET web demonstrator.

Some issues discovered:

  • “Specific indexing allows for greater discrimination when searching but also carries the possibility of indexer variation (or error) in concept selection” (p. 525).
  • “When considering thesaurus-based query expansion, it is necessary to distinguish the expansion mechanism from the matching function or query engine it informs in a particular system” (p. 525).

Some arguments for thesaurus-based query expansion:

  • “Incorporating semantic expansion in the controlled vocabulary matching function allows for some recovery from variations due to indexing specificity or error” (p. 525).
  • “Partial matching allows for possible recovery from situations where indexer and searcher differ in choice of concepts” (p. 530).

There seem to be tradeoffs between precision and accuracy; browsing might take a little longer initially, but in some cases, this method is helpful when negotiating an anomalous state of knowledge.  Finding the right level of flexibility depends on the individual user’s need.  According to the user studies conducted, there appears to be a need for more “active system support for the search process itself (particularly for reformulation) (p. 517).  It might be interesting to explore interactive search/browse interfaces that dynamically generate previews of query results, however this may depend on how distributed the collections are and how many controlled vocabularies are being used.

It also seems that consideration for users of varying levels of expertise with thesauri or advanced search interfaces should be taken into account.  There should be flexibility of the inclusion of features that could potentially introduce complexity to the average user.  Also important is understanding who the users of the system will be in order to develop a set of requirements that the system must have.  This in turn will help us understand some aspects of how we can evaluate the system and how we should select participants for user studies if desired.

This entry was posted in Displays, Systems and tagged , , , . Bookmark the permalink.