Associative and spatial relationships in thesaurus-based retrieval

Alani, H., Jones, C. and Tudhope, D. (2000). Associative and spatial relationships in thesaurus-based retrieval. In: Proc. 4th European Conference on Digital Libraries (ECDL), Lisbon, Portugal. pp. 45-58.

Research Questions and Goals

  • How to perform query expansion without getting search results that are too large or noisy.
  • How AAT’s related term (RT) relationships can be helpful to users in exploring topics around an information need and performing query expansion.
  • Describe “an integrated spatial and thematic schema and discuss two novel approaches to the application of thesauri, from both spatial and thematic points of view” (p. 2).
  • “Investigate different factors relevant to RT expansion rather than relative weighting of relationships” (p. 5).
  • “Take advantage of more structured approaches to thesaurus construction where different types of RTs are employed (p. 6).

Summary

The work for this paper is part of the project, OASIS (Ontollogically Augmented Spatial Information System), which explores “terminology systems for thematic and spatial access in digital library applications” (p. 1).  There is a “vocabulary problem” in using thesauri structure for retrieval.  One is that the indexer and the searcher are often going according to different levels of specificity/granularity.  Another is that indexers often are inconsistent in how they assign terms to a resource.  Alani, et. al, discuss semantic distance measures that can allow automatic query expansion through ranking lists of candidate terms.

Spatial data includes hierarchical and adjacency relations.  RTs are non-hierarchical and are sometimes viewed as weaker relationships compared to NTs or BTs.  They use the term “ontology” to mean a “conceptualization of a domain, in effect providing a connecting semantics between thesaurus hierarchies with specifications of roles for combining thesaurus elements” (p. 3).

Semantic Distance:

  • Often based on minimum number of semantic relationships that have to be traversed for two terms to be connected where each traversal has an associated cost.
  • Depth within the hierarchy where distance between terms are greater towards the top than the bottom with relation to specificity, density, or importance.
  • AAT has been known to have a mono-hierarchical nature.

Included a couple of scenarios:

  • BT/NT expansion only
  • RT expansion included
  • Filtering by RT expansion

Relationship specializations allow the treatment of AAT as a poly-hierarchical system during retrieval.  Then by filtering by relationship specializations, it can be treated as mono or poly-hierarchical.

Relationship Specializations

  • 1A—Alternate BT
  • 1B—Alternate NT
  • 2A—Part
  • 2B—Whole
  • 3—Inter/intra facet relationship
  • 4—Distinguished from relationship
  • 5—Frequently conjuncted term (p. 7).

Methodology

  • Thematic data is from the Royal Commission of the Ancient and Historical Monuments of Scotland (RCAHMS) in addition to data from Getty AAT and TGN thesauri.
  • Schema is implemented using the object-oriented SIS (Semantic Index System).
  • In order to allow query expansion for identifying similar terms, OASIS implements a set of thematic and spatial measures.
  • Used AAT RT editorial manual to identify valid RT relationships.
  • Uses weighting of BT 3, NT 3, and RT 4.  It is inversely proportional to hierarchy depth of the destination term from the original term.  Assigned lowest costs to NTs and favor RTs over BTs at higher depths in the hierarchy Threshold of distance for expansion was set at 2.5 (p. 5).

Findings

  • Thesaurus relationships can provide us with semantic distance measures that can enable interactive and automatic query expansion.
  • AAT can be treated as a poly-hierarchical system by filtering by RT specialization.
  • “Specialising RTs allow the possibility of dynamically linking RT type to query content and, in cases like the AAT, treating alternate hierarchical RT relationships more flexibly for retrieval purposes” (p. 9).

Future Work

  • Researching the “effects of combining RT and BT/NT expansion, or chains of hierarchical and non-hierarchical relationships” (p. 8).
  • The retrieval potential of geographical metadata schema (e.g., rich place name data, locational data), spatial extent, or footprint.
This entry was posted in Vocabularies and tagged , , , , . Bookmark the permalink.