Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 5, 2022
Date Accepted: Sep 7, 2023

The final, peer-reviewed published version of this preprint can be found here:

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review

Bazoge A, Morin E, Daille B, Gourraud PA

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review

JMIR Med Inform 2023;11:e42477

DOI: 10.2196/42477

PMID: 38100200

PMCID: 10757232

Natural language processing on data from clinical data warehouse: A systematic review

  • Adrien Bazoge; 
  • Emmanuel Morin; 
  • BĂ©atrice Daille; 
  • PIerre-Antoine Gourraud

ABSTRACT

Background:

In recent years, the health data collected during clinical care are often repurposed for various secondary use through Clinical data warehouses (CDWs), which interconnect disparate data from different sources. The majority of clinical data is stored in unstructured text format. NLP, which implements algorithms that can operate on a scale as massive as unstructured textual data, have the potential to make clinical data more accessible.

Objective:

The objective of this paper is to provide an overview of studies applying Natural Language Processing (NLP) to clinical text from CDWs.

Methods:

We searched relevant articles in three bibliographic databases: PubMed, Association of Computational Linguistics (ACL) Anthology and Google Scholar. We reviewed the titles and abstract and included articles that focused on NLP applied to data from CDWs. Included articles were reviewed a second time, by reading the whole article, to retrieve the following information: NLP tasks, NLP methods used, data language and CDW involved in the studies.

Results:

We identified a total of 1,251 articles, published between 1995 and 2021, of which 196 met inclusion criteria. For studies that applied NLP to text, information extraction was the most predominant task (55.33%). Symbolic methods were the most common NLP methods (52.7%), followed by machine learning (32.3%) and deep learning (15%). These NLP methods were mostly applied to data written in English (75.63%).

Conclusions:

Although the use of NLP in CDWs is growing, CDWs are still underexploited for clinical NLP researches and there is remaining challenges in this field. Clinical NLP is an effective strategy for accessing, extracting and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and impact clinical practice.


 Citation

Please cite as:

Bazoge A, Morin E, Daille B, Gourraud PA

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review

JMIR Med Inform 2023;11:e42477

DOI: 10.2196/42477

PMID: 38100200

PMCID: 10757232

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.