Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 3, 2025
Date Accepted: Oct 31, 2025
Date Submitted to PubMed: Oct 31, 2025
Medical Feature Extraction from Clinical Exam Notes: Development and Evaluation of a Two-Phase Large Language Model Framework
ABSTRACT
Background:
Medical feature extraction from clinical text is challenging due to limited data availability, variability in medical terminology, and the critical need for trustworthy outputs. Existing approaches struggle to balance accuracy with reliable confidence, particularly when handling ambiguous or complex medical descriptions.
Objective:
This study aims to develop a robust framework for medical feature extraction that enhances accuracy and confidence while minimizing hallucination risks, even with limited training data.
Methods:
We introduce Multi-CONFE (Multi-dimensional CONfidence-aware Feature Extractor), a novel end-to-end framework that integrates instruction-tuned large language models with multi-dimensional confidence calibration. Multi-CONFE employs dynamic adjustment of calibration thresholds during training, complexity-aware confidence scaling, and bidirectional semantic mapping to improve feature detection and reduce errors.
Results:
Evaluations on USMLE Step-2 Clinical Skills notes demonstrate that Multi-CONFE achieves a leading F1 score of 0.983, significantly surpassing prior benchmarks, including INCITE (F1=0.888) and DeBERTa-based models (F1=0.958). Multi-CONFE reduces hallucination risk by 89.9% and improves clinical feature detection by 89.6% compared to the vanilla model. Furthermore, utilizing only 12.5% of the training data (100 of 800 clinical notes), our framework achieved a competitive F1 score of 0.973.
Conclusions:
Multi-CONFE demonstrates exceptional efficacy and robustness in medical feature extraction, delivering high performance with minimal data requirements. Its ability to significantly reduce hallucination risks and improve feature detection accuracy positions it as a leading solution for clinical text analysis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.