Accepted for/Published in: JMIR Medical Informatics
Date Submitted: May 11, 2021
Date Accepted: Jan 2, 2022
Evaluation of Natural Language Processing for the identification of Crohn’s Disease-related variables in Spanish Electronic Health Records: a study of the PREMONITION-CD project
ABSTRACT
Background:
The exploration of unstructured data contained in the electronic health records (EHRs) holds the potential to positively impact clinical practice as well as research in Crohn’s disease (CD), an inflammatory bowel disease in which lesions may be found anywhere along the gastrointestinal tract. There is a clinical need for a rapid and early extraction of this information, for which the use of the most advanced technology is advantageous. In this regard, the EHRead® Technology is a natural language processing (NLP) system that was developed by SAVANA, to retrieve prominent biomedical information from narratives in the clinical notes contained in EHRs.
Objective:
The aim of this study was to validate the EHRead® Technology in identifying CD reports from clinical narratives.
Methods:
We employed the EHRead® Technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the EHRead® output with the gold standard (manually reviewed by experienced researchers) in order to assess the quality of NLP-identified records containing any reference to CD and its related variables.
Results:
The validation metrics for the main variable, (i.e. CD), were a precision of 0.88, a recall of 0.98 and an F1-score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71 and F1-score of 0.80 for CD disease flare, while the variable vedolizumab (treatment) yielded a precision, recall and F1-score of 0.86, 0.94 and 0.90, respectively.
Conclusions:
This evaluation demonstrates the ability of the EHRead® technology to identify CD patients and their related variables in unstructured information contained in EHRs. To the best of our knowledge, the present study is the first to use an NLP system for the identification of Crohn’s disease in reports written in Spanish.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.