Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 12, 2018
Open Peer Review Period: Nov 22, 2018 - Jan 17, 2019
Date Accepted: Mar 23, 2020
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study

Massonnaud CR, Kerdelhué G, Grosjean J, Lelong R, Griffon N, Darmoni SJ

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study

JMIR Med Inform 2020;8(6):e12799

DOI: 10.2196/12799

PMID: 32496201

PMCID: 7303830

Finding the best semantic expansion to query PubMed: automatic performance assessment of four search strategies on all MeSH descriptors.

  • Clément R Massonnaud; 
  • Gaetan Kerdelhué; 
  • Julien Grosjean; 
  • Romain Lelong; 
  • Nicolas Griffon; 
  • Stefan J Darmoni

ABSTRACT

Background:

With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansions of queries using synonyms may improve information retrieval.

Objective:

The aim of this study was to propose an innovative method that could estimate automatically the three main metrics used in information science (precision, recall and F-measure) of four different semantic expansion strategies, assessed on all the descriptors in the MeSH thesaurus (n=28,313).

Methods:

Four search strategies were assessed in this study: the standard Automatic Term Mapping (ATM) of PubMed and three strategies that use semantic expansion. ATM queries were of the form: (“preferred term”[MH] OR “preferred term”[All fields]). The queries of the other three strategies were of the form: (“preferred term”[MH] OR “preferred term”[All fields] OR “synonym 1”[All fields] OR “synonym 2”[All fields] OR etc.). These three strategies differed by the number and provenance of the synonyms used to build the queries: MeSH synonyms, UMLS mappings and custom mappings (CISMeF). Metrics were assessed by computing (A): the number of all relevant citations, using NLM indexing as gold standard ((“preferred term”[MH]), (B): the number of citations retrieved by the added terms, and (C): the number of relevant citations retrieved by the added terms (combining the previous two queries with an “AND” operator). Therefore, it was possible to compute programmatically the metrics for each strategy using every MeSH descriptor as a “preferred term”. 239,724 different queries were built and sent to PubMed API. The four search strategies were ranked and compared for each metric.

Results:

ATM had the worst performance of the four strategies, for all three metrics. MeSH strategy had the best mean precision (50.93 %, SD = 23 %). UMLS strategy had the best recall and F-measure (40.57 %, SD = 31 % and 35.51 %, SD = 24 %, respectively). CISMeF had the second best recall and F-measure (40.11 %, SD = 31 % and 35.10 %, SD = 24 %, respectively). However, considering a cut-off of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors and better F-measure for 678 descriptors.

Conclusions:

This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to the other, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user’s objectives. This led our team to develop an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics, precision, recall or F-measure.


 Citation

Please cite as:

Massonnaud CR, Kerdelhué G, Grosjean J, Lelong R, Griffon N, Darmoni SJ

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study

JMIR Med Inform 2020;8(6):e12799

DOI: 10.2196/12799

PMID: 32496201

PMCID: 7303830

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.