Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Apr 7, 2024
Date Accepted: Aug 2, 2025

The final, peer-reviewed published version of this preprint can be found here:

Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study

Kocaman V, Cheng FY, Bonis J, Raut G, Timsina P, Talby D, Kia A

Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study

JMIR AI 2025;4:e59251

DOI: 10.2196/59251

PMID: 40911864

PMCID: 12449662

Exploring Named Entity Recognition potential and the value of tailored Natural Language Processing pipelines for radiology, pathology, and progress notes in Clinical Decision Support.

  • Veysel Kocaman; 
  • Fu-Yuan Cheng; 
  • Julio Bonis; 
  • Ganesh Raut; 
  • Prem Timsina; 
  • David Talby; 
  • Arash Kia

ABSTRACT

Background:

Clinical notes house rich, yet unstructured patient data, making analysis challenging due to medical jargon, abbreviations, and synonyms causing ambiguity. This complicates real-time extraction for decision support tools.

Objective:

We focus on the data curation, technology, and workflow for NER-based Clinical Decision Support tools that distill key entities from notes to guide patient care.

Methods:

We gathered progress, care, radiology, and pathology notes from 5,000 patients, dividing them into five batches of 1,000 patients each. Metrics like notes/reports per patient, sentence count, token size, runtime, CPU, and memory usage were measured per note type. We also evaluated NER and assertion models against ground truth data.

Results:

Using the SparkNLP clinical NER model on 138,250 clinical notes, we observed excellent NER metrics with a peak precision in "procedures" at 0.989 (95% CI: 0.977-1.000) and an accuracy in the assertion model of 0.889 (95% CI: 0.856-0.922). Our analysis highlighted long-tail distributions in notes per patient, note length, and entity density. Progress care notes had notably more entities per sentence than radiology and pathology notes, showing four-fold and sixteen-fold differences, respectively.

Conclusions:

Further research should explore the analysis of clinical notes beyond the scope of our study, including discharge summaries and psychiatric evaluation notes. Recognizing the unique linguistic characteristics of different note types underscores the importance of developing specialized NER models or NLP pipeline setups tailored to each. By doing so, we can enhance their performance across a more diverse range of clinical scenarios.


 Citation

Please cite as:

Kocaman V, Cheng FY, Bonis J, Raut G, Timsina P, Talby D, Kia A

Exploring Named Entity Recognition Potential and the Value of Tailored Natural Language Processing Pipelines for Radiology, Pathology, and Progress Notes in Clinical Decision Support: Quantitative Study

JMIR AI 2025;4:e59251

DOI: 10.2196/59251

PMID: 40911864

PMCID: 12449662

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.