Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 5, 2022
Date Accepted: Sep 7, 2023
Natural language processing on data from clinical data warehouse: A systematic review
ABSTRACT
Background:
In recent years, the health data collected during clinical care are often repurposed for various secondary use through Clinical data warehouses (CDWs), which interconnect disparate data from different sources. The majority of clinical data is stored in unstructured text format. NLP, which implements algorithms that can operate on a scale as massive as unstructured textual data, have the potential to make clinical data more accessible.
Objective:
The objective of this paper is to provide an overview of studies applying Natural Language Processing (NLP) to clinical text from CDWs.
Methods:
We searched relevant articles in three bibliographic databases: PubMed, Association of Computational Linguistics (ACL) Anthology and Google Scholar. We reviewed the titles and abstract and included articles that focused on NLP applied to data from CDWs. Included articles were reviewed a second time, by reading the whole article, to retrieve the following information: NLP tasks, NLP methods used, data language and CDW involved in the studies.
Results:
We identified a total of 1,251 articles, published between 1995 and 2021, of which 196 met inclusion criteria. For studies that applied NLP to text, information extraction was the most predominant task (55.33%). Symbolic methods were the most common NLP methods (52.7%), followed by machine learning (32.3%) and deep learning (15%). These NLP methods were mostly applied to data written in English (75.63%).
Conclusions:
Although the use of NLP in CDWs is growing, CDWs are still underexploited for clinical NLP researches and there is remaining challenges in this field. Clinical NLP is an effective strategy for accessing, extracting and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and impact clinical practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.