Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 25, 2021
Date Accepted: Nov 10, 2021

The final, peer-reviewed published version of this preprint can be found here:

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study

Ahne A, Fagherazzi G, Tannier X, Czernichow T, Orchard F

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study

J Med Internet Res 2022;24(1):e27434

DOI: 10.2196/27434

PMID: 35040795

PMCID: 8808347

Improving literature exploration in the clinical decision-making process: Interactive classification and topic discovery on diabetes-related biomedical literature

  • Adrian Ahne; 
  • Guy Fagherazzi; 
  • Xavier Tannier; 
  • Thomas Czernichow; 
  • Francisco Orchard

ABSTRACT

Background:

The amount of available textual health data such as scientific and biomedical literature is constantly growing and it becomes more and more challenging for health professionals to properly summarise those data and in consequence to practice evidence-based clinical decision making. Moreover, the exploration of large unstructured health text data is very challenging for non experts due to limited time, resources and skills. Current tools to explore text data lack ease of use, need high computation efforts and have difficulties to incorporate domain knowledge and focus on topics of interest.

Objective:

We developed a methodology which is able to explore and target topics of interest via an interactive user interface for experts and non-experts. We aim to reach near state of the art performance, while reducing memory consumption, increasing scalability and minimizing user interaction effort to improve the clinical decision making process. The performance is evaluated on diabetes-related abstracts from Pubmed.

Methods:

The methodology consists of four parts: 1) A novel interpretable hierarchical clustering of documents where each node is defined by headwords (describe documents in this node the most); 2) An efficient classification system to target topics; 3) Minimized users interaction effort through active learning; 4) A visual user interface through which a user interacts. We evaluated our approach on 50,911 diabetes-related abstracts from Pubmed which provide a hierarchical Medical Subject Headings (MeSH) structure, a unique identifier for a topic. Hierarchical clustering performance was compared against the implementation in the machine learning library scikit-learn. On a subset of 2000 randomly chosen diabetes abstracts, our active learning strategy was compared against three other strategies: random selection of training instances, uncertainty sampling which chooses instances the model is most uncertain about and an expected gradient length strategy based on convolutional neural networks (CNN).

Results:

For the hierarchical clustering performance, we achieved a F1-Score of 0.73 compared to scikit-learn’s of 0.76. Concerning active learning performance, after 200 chosen training samples based on these strategies, the weighted F1-Score over all MeSH codes resulted in satisfying 0.62 F1-Score of our approach, compared to 0.61 of the uncertainty strategy, 0.61 the CNN and 0.45 the random strategy. Moreover, our methodology showed a constant low memory use with increased number of documents but increased execution time.

Conclusions:

We proposed an easy to use tool for experts and non-experts being able to combine domain knowledge with topic exploration and target specific topics of interest while improving transparency. Furthermore our approach is very memory efficient and highly parallelizable making it interesting for large Big Data sets. This approach can be used by health professionals to rapidly get deep insights into biomedical literature to ultimately improve the evidence-based clinical decision making process.


 Citation

Please cite as:

Ahne A, Fagherazzi G, Tannier X, Czernichow T, Orchard F

Improving Diabetes-Related Biomedical Literature Exploration in the Clinical Decision-making Process via Interactive Classification and Topic Discovery: Methodology Development Study

J Med Internet Res 2022;24(1):e27434

DOI: 10.2196/27434

PMID: 35040795

PMCID: 8808347

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.