Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Dec 15, 2022
Open Peer Review Period: Dec 15, 2022 - Feb 9, 2023
Date Accepted: Jun 3, 2023
(closed for review but you can still tweet)
Identifying Risk Factors Associated With Lower Back Pain In Electronic Medical Record Free Text: A Deep Learning Approach Using Clinical Note Annotations
ABSTRACT
Background:
Lower back pain is a common weakening condition that affects a large population. It is a leading cause of disability and lost productivity, and the associated medical costs and lost wages place a significant burden on individuals and society. Recent advances in artificial intelligence (AI) and natural language processing (NLP) have opened new opportunities for the identification and management of risk factors for lower back pain. In this paper, we propose and train a deep learning model on a dataset of clinical notes that have been annotated with relevant risk factors, and we evaluate the model's performance in identifying risk factors in new clinical notes.
Objective:
The primary objective is to develop a novel deep learning approach to detect risk factors for underlying disease in patients presenting with lower back pain in clinical encounter notes. The secondary objective is to propose solutions to potential challenges of using deep learning and NLP techniques for identifying risk factors in EMR free text and make practical recommendations for future research in this area.
Methods:
We manually annotated clinical notes for the presence of six risk factors for severe underlying disease in patients presenting with lower back pain. Data was highly imbalanced, with only 12% of the annotated notes having at least one label. To address imbalanced data, a combination of semantic matching and regular expressions was used to further capture more notes to annotate. Further analysis was conducted to study the impact of down-sampling, binary formulation of multi-label classification and unsupervised pre-training on classification performance. Lastly, the proposed BERT-based model was compared using original BERT baselines for detecting lower back pain risk factors.
Results:
Of 2350 clinical notes labeled, 347 had at least one label, while 2402 had no labels. Down-sampling the training set to equalize the ratio of clinical notes with and without risk factors improved the average AUC by 21% for the BERT baseline. The proposed BERT-based model performed 3% better than the BERT baseline in multi-task learning. Unsupervised pre-training using causal language modeling on clinical notes can further improve performance by 1%.
Conclusions:
Primary care clinical notes are likely to require manipulation to perform meaningful free-text analysis. The application of BERT Transformer models for multi-label classification on down-sampled annotated clinical notes is useful in detecting risk factors suggesting an indication for imaging for patients with lower back pain.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.