Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM

Citation:

Binding, C., May, K., & Tudhope, D. (2008). Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 5173, 280-290.

Goals of Research:

  • To investigate the utility of mapping different datasets to a common overarching ontology (CRM-EH), in order to promote effectiveness of cross-database search.

Research Questions:

  • What are the potential benefits of expressing different databases as RDF and conforming them to a common ontology

Methodology:

  • Semi-automatic data mapping/extraction

Summary:

    The rational behind converting the databases to a semantic enabled format such as RDF is to provide access to the “deep Web” content and produce Linked Data. In this study data has been extracted from five relatively big archaeological databases and has been mapped to to CRM-EH, an extension of CIDOC-CRM which is extended to model archaeological processes and concepts in use by English Heritage (EH). The intention is that the use of a common ontology would improve the cross-domain searching. This study raised practical issues such as, the need for data cleansing, common unique identifiers, intellectual work by domain experts, difficulties with abstractness of concepts, lengthy relationship chains (because CRM is an event-based ontology). The latter also cause issues regarding user interface design. In this project data models are mapped by domain experts and the data extracted by an automatic tool and controlled by experts for inconsistencies. Finally, an initial prototype application developed for cross searching. “Retrieved query results are displayed as a series of entry points to the structured data; it is then possible to browse to other interrelated data items, by following chains of relationship within the CRM-EH, beaming up from data items to concepts as desired.” The application take benefit of SKOS -based terminology services, which provides the ability to browse concepts via semantic relationships in a thesaurus.

Findings:

  • The initial search prototype shows useful cross searching and browsing functionality
  • Consistency will improve by use of semi-automatic tools align with intellectual work of domain experts
  • Data cleansing and consistent unique identifiers are necessary
  • Technical extension to the CIDOC-CRM is necessary
  • sometimes it is important to model events which are not explicitly surfaced in data model

Future Works:

  • “The next phase of the project will investigate interactive and automated traversal of the chains of semantic relationships in an integrated data/concept network, incorporating the EH thesauri to improve search capability.” (p. 288)

Important References:

This entry was posted in Ontology, Vocabularies. Bookmark the permalink.