Using Visualization for Exploring Relationships between Concepts in Ontologies

Isabel Cristina Siqueira da Silva, Carla Maria Dal Sasso Freitas(2011). Using Visualization for Exploring Relationships between Concepts in Ontologies. IV ’11 Proceedings of the 2011 15th International Conference on Information Visualisation,317-322.

The most important achievement of this paper is 2.5D scheme applied to an ontology hierarchy/graph.

This paper focuses on visualizing the hierarchy of the ontology and the relationships between concepts employing multiple views. For the hierarchy, it employ the 2D hyperbolic tree which reduces the cognitive overload and the user disorientation that might happen during the interaction with the nodes, expanding and contracting it, especially in ontologies with many concepts(Figure 1a).

It use a third dimension to display one or more relationships (object properties) selected by the user in a second view. To view them, it take the plane where the tree is displayed and perform a 90° rotation around the X-axis (Figure 1b). The rotated plane, positioned in 3D as an XZ-plane, displays the hyperbolic tree, and selected relationships are represented as curved lines in space, connecting the related concepts, without interfering with the display of the hierarchical relationship.

Figure 1c shows the proposed 2.5D scheme applied to an ontology hierarchy/graph.

(I can’t upload the image, so you have to read it from the full text.)

The use of 2.5D visualization might be a solution to common problems presented by 2D and 3D ontology visualization tools, mainly cognitive overload and user disorientation.

Posted in Displays, Ontology | Comments Off on Using Visualization for Exploring Relationships between Concepts in Ontologies

Challenges and issues in terminology mapping

A good review article: McCulloch, E.; Shiri, A.; & Nicholson, D. (2005).  Challenges and issues in terminology mapping:  A digital library perspective.  The Electronic Library, 23(6), 671-677.

The abstract:

Purpose – In light of information retrieval problems caused by the use of different subject schemes, this paper provides an overview of the terminology problem within the digital library field. Various proposed solutions are outlined and issues within one approach – terminology mapping are highlighted.
Design/methodology/approach – Desk-based review of existing research.
Findings – Discusses benefits of the mapping approach, which include improved retrieval effectiveness for users and an opportunity to overcome problems associated with the use of multilingual schemes. Also describes various drawbacks such as the labour intensive nature and expense of such an approach, the different levels of granularity in existing schemes, and the high maintenance requirements due to scheme updates, and not least the nature of user terminology.
Originality/value – General review of mapping techniques as a potential solution to the terminology problem.
Keywords — Classification, Information retrieval, Digital libraries, Open systems
Paper type General review

 

 

Posted in Literature review, Vocabularies | Tagged | Comments Off on Challenges and issues in terminology mapping

User-centered indexing

Fidel, R. (1994). User-centered indexing. Journal of the American Society for Information Science, 45(8), 572-576.
  1. User-centered indexing: requires we index reflecting the approach users would take to find a document.
  2. Document centered: that indexing, like abstracts, creates surrogates for documents.
    Purpose:  to represent the content or features of a document:
    “Aboutness” what is the document about?
    Generally poor inter-indexer agreement.

    • Process: content-analysis of the document to select concepts that represent it.
    • Translation, expressing the concepts in the indexing language.
      Requires rules (policy):
      a. Sources of terms (controlled vocabulary, other?)
      b. Specificity (how narrow or broad)
      c. Weights (reflect the importance of the concept)
      d. Accuracy (how to translate when there is no equivalent?)
      e. Degree of precombination (decide to use pre or post)
      f. User language (assign terms approximate to the users)
      — and some content analysis policies —
      g. Exhaustivity (how comprehensive)
      h. Indexable matter (what parts of the doc should be represented)

Request-Oriented – Soergel (1985):
Checklist indexing: check each document against the descriptors in a vocabulary (but, costly and time-consuming). Classified structure for indexing can help improve efficiency of this approach.

Automated approach: computerized
– dynamic
– objective and consistent
– natural language requests, relevance feedback, ranked output, query expansion
– “indexing and searching are two sides of the same coin”

Posted in Concepts and Theories, Vocabularies | Tagged , | Comments Off on User-centered indexing

A Faceted Classification as the Basis of a Faceted Terminology: Conversion of a Classified Structure to Thesaurus Format in the Bliss Bibliographic Classification, 2nd Edition

Citation: Vanda Broughton. A Faceted Classification as the Basis of a Faceted Terminology: Conversion of a Classified Structure to Thesaurus Format in the Bliss Bibliographic Classification, 2nd Edition. Axiomathes. 2008, 18(2): 193-210

This paper develops a faceted thesaurus based on a faceted classification system. It describes the creation of a thesaurus using the Bliss Bibliographic Classification Second Edition (BC2) as the starting point. It examines various aspects of the relationship between the two forms of vocabulary and how these might be resolved to create a truly integrated and interdependent structure that can be managed to some extent mechanically.

For a regular thesaurus, there should be relationship tags like UF (use for), Use, BT (Broader term), NT (Narrower term), RT (Related Term) to control and represent the relationships of vocabularies. To build a faceted thesaurus, term relations should be detected and converted from the hierarchical structure.

1. For relationships in the Faceted Classification

(1)   Broader Term (BT)/ Narrower Term (NT) Relationships

Facet structure provides paradigmatic BT and NT relationships. For BC2, every parent is the BT of each of its children. Every child is the NT of its parent.

(2)   Related Terms (RT) Derived from Hierarchy

RTs are derived from terms in the same array (sibling terms), and terms not necessarily siblings, but at a coordinate level in the hierarchy.

2. Vocabulary Control in the Faceted Classification

(1)   Equivalence Relationships

The conceptual structure of the classification collocates synonyms (equivalent terms within the same class heading or caption). They are converted as equivalence relationships.

For the most part the BC2 schedules observe some stylistic conventions for synonyms. Usually two or more synonymous terms are separated by a comma, and the evident preferred term is listed first.

(2)   Compound Terms

When building thesaurus, it may be needed to convert a single term to a compound phrase, because when moving the single term out from the hierarchical structure, more information would be needed to identify or understand it. The rules for when compounds may be included in a thesaurus are very complex, but broadly embrace:

a. Adjectival phrases where one term generates a species or subclass of the other;

b. Compounds where the terms in combination mean something other than the sum of their parts, or where the parts are meaningless when separated;

c. The term is conceptually compound but represented by a single term.

Conclusion

It is clear that the faceted structure of the BC2 terminologies supports the generation of a compatible thesaurus in a number of ways. For example, it allows the precise identification of broader/narrower terms, and sibling and coordinate associative terms. Equivalence relationships are acknowledged through collocation of equivalent or near equivalent terms. BC2 has the potential for syntagmatic relationships to be automatically detected in a populated system.

While the structure is excellent for managing relationships, questions of vocabulary control are only now in the process of being addressed, and it clear that more rigor must be introduced into the formatting of classes if terms are to be handled as effectively as concepts.

Posted in Vocabularies | Tagged , | Comments Off on A Faceted Classification as the Basis of a Faceted Terminology: Conversion of a Classified Structure to Thesaurus Format in the Bliss Bibliographic Classification, 2nd Edition

ThManager: an open source tool for creating and visualizing SKOS

Citation: ThManager: an open source tool for creating and visualizing SKOS. Javier Lacasta, Francisco Javier Lopez-Pellicer, Pedro Rafael Muro- Medrano, Javier Nogueras-Iso, and Francisco Javier Zarazaga-Soria. Information Technology and Libraries. 26.3 (Sept. 2007).

Also: http://thmanager.sourceforge.net/index.html

ThManager is an Open Source Tool for creating and visualizing SKOS RDF vocabularies, a W3C initiative for the representation of knowledge organization systems such as thesauri, classification schemes, subject heading lists, taxonomies, and other types of controlled vocabulary. ThManager facilitates the management of thesauri and other types of controlled vocabularies, such as taxonomies or classification schemes. The tool has been implemented in Java and has the following features:

  • Multi-platform (Windows, Unix). As it has been developed in Java and the storage of metadata records is managed directly through the file system, the application can be deployed in any platform with the minimum requirement of having installed a Java virtual machine.
  • Multilingual. The application has been developed following the Java internationalization methodology. Nowadays, there are Spanish and English versions. With little effort, other languages could be supported.
  • Selection and filtering of the thesauri stored in the local repository.
  • Description of thesauri by means of metadata in compliance with a Dublin Core based application profile for thesaurus. These metadata can be either visualized in HTML or edited through a form.
  • Visualization of thesaurus concepts. The visualization interface includes the following widgets:
    • Alphabetic viewer: It provides the list of thesaurus concepts alphabetically ordered in the selected language.
    • Hierarchical viewer: It provides a tree showing the hierarchical structure of thesaurus concepts.
    • Concept viewer: For a selected concept it shows all the properties allowing additionally the navigation to the related concepts by means of hyperlinks.
    • Search tool: It facilitates search of concepts. The searching process is based on preferred labels allowing the following criteria: “equals”, “starts with” and “contains”.
  • Edition of thesaurus content. The tool provides an edition interface to modify the content of a thesaurus: creation of concepts, deletion of concepts, and update of concept properties.
  • Exchange of thesauri according to SKOS format. The export operation includes the export of thesaurus metadata.
  • Extraction of related concepts in WordNet. It generates an automatic mapping of thesaurus concepts against the concepts of Wordnet lexical database.
  • On-line help by means of PDF visualization.

The Lacasta et. al article provides an overview of many other thesaurus creation and management tools including, Lexico, MultiTes and TemaTes and a presentation of the ThManager architecture and functionality.

Posted in Applications, Standards, Systems | Tagged , , , | Comments Off on ThManager: an open source tool for creating and visualizing SKOS

TemaTres: open-source controlled vocab manager

I think TemaTres looks like a good candidate to manage a controlled vocabulary. It’s open source, web-based, and runs on PHP/MySQL. I spent some time using it and it looks like a nice, lightweight manager. It’s pretty good out of the box, and with a little bit of effort we could customize to our liking.
The project homepage: http://www.vocabularyserver.com/

Here’s a favorable review of the software: http://databits.lternet.edu/spring-2011/managing-controlled-vocabularies-tematres

I created a demo installation to give it a try at:
http://uxtheory.com/controlledvocab/vocab/login.php
User: demo@drexel.edu
Pass: demo

Controls to add/modify terms appear as a dropdown in the top menu under the label “Menu”

Posted in Applications, Standards, Systems | Tagged , | Comments Off on TemaTres: open-source controlled vocab manager

SIS – TMS : A Thesaurus Management System for Distributed Digital Collections

Doerr M., Fundulaki I. (1998). SIS-TMS: A thesaurus management system for distributed digital collections. Proc. 2nd European Conference on Digital Libraries (ECDL’98), (C. Nikolaou and C. Stephanidis eds.) Lecture Notes in Computer Science 1513, Springer-Verlag: Berlin, 215-234.

Introduction

The focus of this paper is to present methods and an actual system suited to store, maintain and provide access to knowledge structures that are in use or needed for the respective auxiliary system interfaces and three tasks.

– Guide the user from his/her naïve request to the use of a set of terms optimal for his purpose and for the characteristics of the target information source.

– Expand naïve user terms or the terms optimal for the purpose of the user into sets of terms optimal for each different information source.

– Classify all information assets of a certain collection with controlled vocabulary from a specific thesaurus.

This paper proposes the following requirements for Thesaurus Management:

(1) interaction with the thesaurus contents except manipulations, (2) maintenance, i.e. the manipulation of the contents and the necessary and desirable support of associated work processes, and finally (3) analysis, i.e. the logical structure needed to support (1), (2), and the thesaurus semantics in the narrower sense.

SIS-TMS

The SIS-TMS is a multilingual thesaurus management system and a terminology server for classification and distributed access to electronic collections following the above analysis. The its distinct features are its capability to store, develop, display and access multiple thesauri and their interrelations under one database schema, to create arbitrary graphical views thereon and to specialize dynamically any kind of relation into new ones. It further implements the necessary version control for a cooperative development and data exchange with other applications in the environment.

It originates in the terminology management system (VCS Prototype) developed by ICS-FORTH in cooperation with the Getty Information Institute in the framework of a feasibility study. It was enhanced within the AQUARELLE project, in particular by the support of multilinguality. An earlier version is part of the AQUARELLE product. A full product version was available summer ’98.

The SIS-TMS is an application of the Semantic Index System, which is a product of the Institute of Computer Science-FORTH, is an object oriented semantic network database used for the storage and maintenance of formal reference information as well as for other knowledge representation applications. It implements an interpretation of the data model of the knowledge representation language TELOS omitting the evaluation of logical rules.

This paper discussed the thesaurus structure from several perspectives, including assumptions on concepts, modeling thesaurus notions, intrathesaurus relations, representing multiple interlinked thesauri, interthesaurus relations.

Collection Management Systems (CMS) such as digital libraries, library systems, and museum documentation systems will continuously change. The CMS can also propose new terms to the TMS. The TMS will be updated with new terms from many sides, and old concepts and terms may be renamed, revised and reorganized. The essential problem is to ensure and maintain consistency between the contents of the vocabularies in the underlying CMS and the contents of the Local Thesaurus Management Systems.

The user interacts with the SIS-TMS via its graphical user interface, which provides unconstrained navigation within and between multiple interlinked thesauri. The user can retrieve information from the SIS-TMS knowledge base using a number of predefined, configurable queries and accept the results either in textual or graphical form. SIS-TMS not only provides graphical representations but an essential feature is its ability to represent in a single graph any combination of relationships in arbitrary depth. (See fig 1, central window). The updates in the SIS-TMS are performed through the Entry Forms in a task oriented way.(See fig 1, right window).
SIS-TMS interface
Fig.1. SIS-TMS User Interface, Browser and Data Entry facility.

Conclusion

Integrated terminology services in distributed digital collections are going to become an important subject, and that the SIS-TMS provides a valuable contribution to that. It solves a major problem, the consistent maintenance of the necessarily central terminological resources between semiautonomous systems. The terminological bases themselves need not be internally distributed, as the access needs low bandwidth, read-only copies can easily be sent around at the given low update rates, and term servers can be cascaded. In the near future, the functionality of this system will be further enhanced to make its usability as wide as possible. Whereas there are several standards for thesaurus contents, no one has so far tried to standardize the three component interfaces: (1) Term Server to retrieval tools, (2) TMS to CMS, (3) TMS to Term Server. As in a distributed information system many components from many providers exist, these three interfaces must become open and standardized, to make a wide use reality.

Posted in Systems | Tagged , , | Comments Off on SIS – TMS : A Thesaurus Management System for Distributed Digital Collections

Commercial Controlled Vocabulary Software Evaluation

Comparative evaluation of thesaurus creation software. (Hedden, 2008)

This article compares three commercial thesaurus creation and maintenance tools; MultiTes, Term Tree 2000, and WebChoir TCS-10. The author sets out requirements for thesaurus maintenance, taken from published standards, that the three tools meet: hierarchical relationships, associated term relationships, ‘used-for’ terms, and optional notes for each term. In addition, all tools support the creation of candidate and approved terms, polyhierarchies, and “disallowing illegal relationships (e.g. circular relationships).”

The following six evaluation measures are used to compare the three tools:

  1. Thesaurus display:
    – alphabetical and/or hierarchical.
  2. Term editing and display:
    – the user interface and controls for creating and maintaining terms.
  3. Searching:
    -ability to search for terms in the thesaurus
  4. User-defined relationships and attributes:
    – the ability to create relationships between terms, such as “broader,” “narrower,” or “related” term; and “use” or “used for.” Additionally, more advanced relationships may be defined by the user to support ontology creation. Terms used in the thesaurus may be categorized for use in a faceted taxonomy. For example, a category of named-entity terms may be useful.
  5. Rules enforcement:
    – the tool should support the user and help them follow rules like prohibiting orphan terms (those who have no broader or narrower terms).
  6. Importing, exporting, and reports:
    – does the tool support batch importing existing thesauri including their relationships. And, does it allow easy exporting to a format that can be imported into other tools? Reports may be useful in multiple formats.

Tools:

MultiTes Pro (http://www.multites.com/)

  1. Thesaurus display:
    – alphabetical display (hierarchical shown in reports).
  2. Term editing and display:
    – terms can be created and related to existing entries simultaneiously
    – relationships cannot be edited, they must first be deleted and new one created
  3. Searching:
    – advanced search supported (ie, search within a note, etc)
  4. User-defined relationships and attributes:
    – does not support user-defined attributes
  5. Rules enforcement:
    – user can delete a term that has narrower terms, allowing for orphans (tool can report orphans however)
  6. Importing, exporting, and reports:
    – imports structured text files, exports and reports in many formats

Term Tree (www.termtree.com.au)

  1. Thesaurus display:
    – alphabetical / hierarchical view
  2. Term editing and display:
    – new terms can be created from existing
  3. Searching:
    – supported
  4. User-defined relationships and attributes:
    – addition of new relationships is no supported
    – addition of categories/attributes is supported
  5. Rules enforcement:
    – user can delete a term that has narrower terms, allowing for orphans (tool can report orphans however)
  6. Importing, exporting, and reports:
    – cannot import XML.
    – exports to many formats: Excel, CSV, XML
    – many report types supported, including KWIC and similar

TCS-10 (www.webchoir.com)

  1. Thesaurus display:
    – hierarchical/alphabetical
  2. Term editing and display:
    – a unique implementation of “use” and “use for”
  3. Searching:
    – guided Boolean search (search within results, etc)
  4. User-defined relationships and attributes:
    – supports user-defined relationships/attributes
  5. Rules enforcement:
    – allow/disallow duplicates and orphans
  6. Importing, exporting, and reports:
    – supports many formats, MARC, XML, ASCII, etc…

Summary

No clear winners, all fulfill basic requirements. All have pros and cons that are need dependent. The author recommends MultiTes as a good value. Customization does not seem to be possible, and only TCS-10 appears to support ontology-like user-defined relationships and attributes.

Citation:
Hedden, H. (2008). Comparative evaluation of thesaurus creation software. The Indexer, 26(2), 50-59.

Posted in Vocabularies | Tagged | Comments Off on Commercial Controlled Vocabulary Software Evaluation

Collaborative design research: The visualization of medical concepts

Citation

Zender, M., & Crutcher, K.A. (2007). Collaborative design research: The visualization of medical concepts. Proceedings of the International association of societies of design research (pp. 1-24). Hong Kong: http://www.sd.polyu.edu.hk/iasdr/proceeding/papers/Collaborative%20Design%20Research_%20The%20Visualization%20of%20Medical%20Concepts.pdf.

Abstract

The proliferation of data is threatening to swamp our ability to convert data into knowledge. Visualization promises to facilitate this conversion. Yet visual communication designers have not been deeply involved. One potential impediment to involvement is the lack of collaboration between visual communication designers and knowledge workers in specialized domains.

This paper describes a collaborative research project that integrates medical science and visual communication design. The project involves the development of a visual language to represent medical concepts by deriving propositions from papers, breaking propositions into concept objects, designing a visual object system (consisting of icons, glyphs and combinations) to represent the objects, and displaying the objects as a network of concepts with links to the original papers. Prototypes have proven to be highly condensed and accurate yet readable in seconds. If the visualization approach proves successful, the results would be groundbreaking in science and design.

Summary

The problem identified by the authors is as follows: Can key concepts in fields with controlled vocabularies, such as medicine, be efficiently communicated with images such as glyphs or icons, and, if so, are these images able to effectively illustrate the conceptual web surrounding hundreds or thousands of journal articles and papers within a specific area of investigation? If such a system were interactive then it might lead to insights more quickly and if it remained linked to individual papers then the visual display might be an improved means of exploring literature databases such as PubMed (p.4). The authors described and defined the parameters for making a comprehensive visual language for the expression of scientific concepts and contexts.

In developing a visual language, the approach taken by the authors was to “identify key concepts, connect those concepts to summary statements, break those statements into their essential conceptual objects, illustrate those concepts using icons and glyphs, and present these visual objects in an interactive concept space where they could be immediately perceived and understood in relation to each other (p.6).” Graphic forms that serve both icon and glyph function were combined allowing for glyphs to signify category while icons signify the meaning of an object within a category. Families of object icons were designed to mimic the parent/child structure of the UMLS. To depict actions, the authors developed Proposition Statements (“Object A – does something relative to – Object B”), then used thick lines with small graphic representations of action concepts (bind, modulate, produce) to connect Object A and Object B. Upon interaction with the line (roll-over) the objects within the line are animated to better depict the action. All other icons have a tool-tip-like name that pops-up when the mouse hovers for more than one half second.

Methodology

First, the icons of the Visualization Systems were informally evaluated for their communicative quality and then the icon-based display was compared against a similar text-based display. In order to develop a unified Visualization System for formal evaluation, 4 expert reviewers were recruited to rate the communication effectiveness of the icon for each object. A rating scale was used by the reviewers in evaluating the icons, the scores were then averaged, and then icons were redesigned based on the evaluation.

 Figure 1: ApoE beta-amyloid icon evaluation result report page, p. 19

Based on the evaluation of the Visualization System, an experiment was designed to compare the icon-based display and a texted-based display. For the testing, a total of 27 subjects, 13 with icon display and 14 with the text display were recruited. The subjects were a representative population of domain experts who performed tasks to evaluate the displays. The tasks were designed to measure three effects: speed of recognition of concepts; speed of identification of related concepts; speed of identification of the type of relationship between concepts.The tasks for each group were identical. Some required simple recognition and identification while others require interpretation and association.

 Figure 2: The icon-based display test

Figure 3: The text-based display test

Results/Conclusions

Overall, the icon-based display was both faster and more accurate. For simple identification tasks, the two displays were nearly equal in speed of identification. For identification tasks requiring reasoning/association of similar concepts, the icon-based display was overall 18% faster. For the tasks requiring the identification of similar concepts, the icon-based display was nearly twice as fast as the text-based display. Accuracy of the icon-based display was equal to the text-based display on simple identification tasks but far more accurate on tasks that required the recognition of relationships. On task 3, “Count the number of diseases in the display” the icon-based display was 4.43 times more accurate than the text-based display (p.22). While the results are from comprehensive, they are promising in that design and visualizations can effectively communication complex scientific content.

Key Points

  • “One problem is language. In scientific literature, as in most other areas, findings are reported in writing and the concepts are embodied in words. Yet words are often difficult to define, requiring a context to determine their meaning (p.5)”
  • “Icon families mimic the parent/child structure of the UMLS and by doing so created icons with a ‘proximate context’ that enables one icon’s meaning, the parent, to inform the meaning of other icons, the children (p.10)”
  • “In the propositional statements we analyzed, “neuronal degeneration” is one example of a process object: a neuron (thing, noun) degenerates (dies). This would be a conceptual entity in the UMLS.” (Zender and Crutcher, 2007)
Posted in Displays, Systems | Tagged , , , | Comments Off on Collaborative design research: The visualization of medical concepts

Enhancing Digital Libraries with Social Navigation: The Case of Ensemble

Brusilovsky, P., Cassel, L., Delcambre, L., Fox, E., Furuta, R., Garcia, D., Shipman, F., et al. (2010). Enhancing Digital Libraries with Social Navigation: The Case of Ensemble. In M. Lalmas, J. Jose, A. Rauber, F. Sebastiani, & I. Frommholz (Eds.), Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science (Vol. 6273, pp. 116-123). Springer Berlin / Heidelberg. Retrieved from http://dx.doi.org/10.1007/978-3-642-15464-5_13

Abstract. A traditional library is a social place, however the social nature of the
library is typically lost when the library goes digital. This paper argues social
navigation, an important group of social information access techniques, could
be used to replicate some social features of traditional libraries and to enhance
the user experience. Using the case of Ensemble, a major educational digital
library, the paper describes how social navigation could be used to extend digital
library portals, how social wisdom can be collected, and how it can be used
to guide portal users to valuable resources.

The Ensemble project aims to replicate the social features of a traditional library in a digital library by collecting user interaction data. This leads to “Social navigation” which “guides users to useful and interesting resources through adaptive link annotation and link recommendation.” The interaction collected in Ensemble inlcudes: “both traditional low-level user actions suchas resource browsing, rating, commenting, and tagging; and higher level structural actions such as fragment extraction and composition.” The collection of this data allows the modelling of a form of collective-wisdom. Users can then have resources suggested to them by the system based on their social activity, and the system provides “social navigation” to the user.

Posted in Applications, Systems | Tagged , , | Comments Off on Enhancing Digital Libraries with Social Navigation: The Case of Ensemble