Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 25, 2023
Date Accepted: Apr 17, 2024
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Retrieval-Based Diagnostic Decision Support
ABSTRACT
Background:
Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability.
Objective:
This study aims to develop an information retrieval (IR) based framework that accommodates data sparsity to facilitate broader diagnostic decision support.
Methods:
We present an IR-based diagnostic decision support framework called CliniqIR. It employs clinical text records, the Unified Medical Language System (UMLS) Metathesaurus, and 33M PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. We compare CliniqIR's performance to pre-trained clinical transformer models (like ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combine the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions.
Results:
CliniqIR returns the correct diagnosis for a DC3 case among its top-3 predictions, on average, on a rare disease dataset (DC3) with no training data. On the MIMIC-III dataset, CliniqIR outperforms ClinicalBERT in predicting diagnoses with fewer than five training samples by an average Mean Reciprocal Rank (MRR) of 9%. In a zero-shot setting, where no specific training was conducted, CliniqIR also outperforms the pre-trained transformer models by an MRR of 10%. Furthermore, our ensemble framework surpassed the individual constituent models by a minimum of 8% in MRR.
Conclusions:
Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.