Natural language processing of clinical notes for cancer research and patient care prior to widespread adoption of generative AI: a scoping review of methods, applications and challenges
ABSTRACT
Background:
The widespread adoption of Electronic Health Records (EHR) in oncology practice, has provided a valuable data resource for both research and patient care. Clinical notes are the most abundant data type in EHRs. However, their highly unstructured format present significant challenges for natural language processing (NLP) analysis. We conducted a scoping review of the literature examining the application of NLP to clinical notes.
Objective:
The review was conducted with the following specific objectives: (1) characterize the clinical notes used, (2) identify NLP techniques employed to analyse these notes, (3) determine the clinical applications of NLP in cancer research and patient care, and (4) highlight the challenges encountered by researchers in this field.
Methods:
We systematically searched MEDLINE, EMBASE, Scopus, and Web of Science for English-language studies published between January 2014 and March 2024 that applied NLP to clinical notes in the cancer domain. Retrieved references were imported into Covidence, a web-based collaboration software platform. Two authors independently screened studies for eligibility and extracted data using a predefined data extraction template. Data were analysed descriptively, providing counts and percentages. The review follows Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines.
Results:
A total of 227 studies were included in the review. Research employing NLP to process clinical notes grew significantly, from 4 studies in 2014 to 43 in 2023. NLP methods have evolved over time, from predominantly rule-based and ontology-driven approaches (2014-2017) to hybrid approaches that combine these with deep neural networks, including pretrained language models (PLMs) (2018-2024). Large language models (LLMs) were noticeably absent, with only 3 studies using LLMs. Most studies (175; 77.1%) focused on information extraction (IE), with a subset applying the extracted data to downstream tasks such as diagnostic classification and prognostic classification. Significant challenges continue to hinder the advancement of clinical NLP and the development of clinically viable solutions, with restricted access to clinical notes (cited by 39 studies) and limited data (18 studies) being the most prominent.
Conclusions:
The application of NLP to clinical notes in the cancer domain has grown, and efficient techniques increasingly applied. However, much of the research is focussed on IE. The analysis of clinical notes with NLP has significant potential to advance cancer research and patient care, but the realization of clinical NLP's full potential will greatly depend on clinical notes becoming available to researchers.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.