Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 11, 2021
Date Accepted: Jan 2, 2022

The final, peer-reviewed published version of this preprint can be found here:

Evaluation of Natural Language Processing for the Identification of Crohn Disease–Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project

Montoto C, Gisbert JP, Guerra I, Plaza R, Pajares Villaroya R, Moreno Almazán L, López Martín MDC, Domínguez Antonaya M, Vera Mendoza MI, Aparicio J, Martínez V, Tagarro I, Fernández-Nistal A, Canales L, Menke S, Gomollón F, PREMONITION-CD Study Group

Evaluation of Natural Language Processing for the Identification of Crohn Disease–Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project

JMIR Med Inform 2022;10(2):e30345

DOI: 10.2196/30345

PMID: 35179507

PMCID: 8900906

Evaluation of Natural Language Processing for the identification of Crohn’s Disease-related variables in Spanish Electronic Health Records: a study of the PREMONITION-CD project

  • Carmen Montoto; 
  • Javier P. Gisbert; 
  • Iván Guerra; 
  • Rocío Plaza; 
  • Ramón Pajares Villaroya; 
  • Luis Moreno Almazán; 
  • María Del Carmen López Martín; 
  • Mercedes Domínguez Antonaya; 
  • María Isabel Vera Mendoza; 
  • Jesús Aparicio; 
  • Vicente Martínez; 
  • Ignacio Tagarro; 
  • Alonso Fernández-Nistal; 
  • Lea Canales; 
  • Sebastian Menke; 
  • Fernando Gomollón; 
  • PREMONITION-CD Study Group

ABSTRACT

Background:

The exploration of unstructured data contained in the electronic health records (EHRs) holds the potential to positively impact clinical practice as well as research in Crohn’s disease (CD), an inflammatory bowel disease in which lesions may be found anywhere along the gastrointestinal tract. There is a clinical need for a rapid and early extraction of this information, for which the use of the most advanced technology is advantageous. In this regard, the EHRead® Technology is a natural language processing (NLP) system that was developed by SAVANA, to retrieve prominent biomedical information from narratives in the clinical notes contained in EHRs.

Objective:

The aim of this study was to validate the EHRead® Technology in identifying CD reports from clinical narratives.

Methods:

We employed the EHRead® Technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the EHRead® output with the gold standard (manually reviewed by experienced researchers) in order to assess the quality of NLP-identified records containing any reference to CD and its related variables.

Results:

The validation metrics for the main variable, (i.e. CD), were a precision of 0.88, a recall of 0.98 and an F1-score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71 and F1-score of 0.80 for CD disease flare, while the variable vedolizumab (treatment) yielded a precision, recall and F1-score of 0.86, 0.94 and 0.90, respectively.

Conclusions:

This evaluation demonstrates the ability of the EHRead® technology to identify CD patients and their related variables in unstructured information contained in EHRs. To the best of our knowledge, the present study is the first to use an NLP system for the identification of Crohn’s disease in reports written in Spanish.


 Citation

Please cite as:

Montoto C, Gisbert JP, Guerra I, Plaza R, Pajares Villaroya R, Moreno Almazán L, López Martín MDC, Domínguez Antonaya M, Vera Mendoza MI, Aparicio J, Martínez V, Tagarro I, Fernández-Nistal A, Canales L, Menke S, Gomollón F, PREMONITION-CD Study Group

Evaluation of Natural Language Processing for the Identification of Crohn Disease–Related Variables in Spanish Electronic Health Records: A Validation Study for the PREMONITION-CD Project

JMIR Med Inform 2022;10(2):e30345

DOI: 10.2196/30345

PMID: 35179507

PMCID: 8900906

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.