Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 12, 2017
Open Peer Review Period: Jun 16, 2017 - Jul 21, 2017
Date Accepted: Sep 27, 2017
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis

Segura Bedmar I, Martínez P, Carruana Martín A

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis

JMIR Med Inform 2017;5(4):e48

DOI: 10.2196/medinform.7059

PMID: 29196280

PMCID: 5732329

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis

  • Isabel Segura Bedmar; 
  • Paloma Martínez; 
  • Adrián Carruana Martín

ABSTRACT

Background:

Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature.

Objective:

The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE.

Methods:

Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms.

Results:

Our experiments show promising results with an F1 of 69% on the test dataset.

Conclusions:

To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time.


 Citation

Please cite as:

Segura Bedmar I, Martínez P, Carruana Martín A

Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis

JMIR Med Inform 2017;5(4):e48

DOI: 10.2196/medinform.7059

PMID: 29196280

PMCID: 5732329

Per the author's request the PDF is not available.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.