Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Feb 15, 2021
Date Accepted: May 13, 2021

The final, peer-reviewed published version of this preprint can be found here:

A Natural Language Processing–Assisted Extraction System for Gleason Scores: Development and Usability Study

Yu S, Le A, Feld E, Schriver E, Gabriel P, Doucette A, Narayan V, Feldman M, Schwartz L, Maxwell K, Mowery D

A Natural Language Processing–Assisted Extraction System for Gleason Scores: Development and Usability Study

JMIR Cancer 2021;7(3):e27970

DOI: 10.2196/27970

PMID: 34255641

PMCID: 8285739

Development of an NLP-Assisted Extraction System for Gleason Scores: Combining the Strengths of Humans and Machines

  • Shun Yu; 
  • Anh Le; 
  • Emily Feld; 
  • Emily Schriver; 
  • Peter Gabriel; 
  • Abigail Doucette; 
  • Vivek Narayan; 
  • Michael Feldman; 
  • Lauren Schwartz; 
  • Kara Maxwell; 
  • Danielle Mowery

ABSTRACT

Background:

Natural language processing (NLP) offers significantly faster variable extraction compared to traditional human extraction, but cannot interpret complicated notes as well as humans can. Thus, we hypothesized that an “NLP-assisted” extraction system, which utilizes humans for complicated notes and NLP for uncomplicated notes, could produce faster extraction without compromising accuracy.

Objective:

To develop and pilot an “NLP-assisted” extraction system to leverage the strengths of both human and NLP extraction of prostate cancer Gleason scores.

Methods:

We collected all available clinical and pathology notes for prostate cancer patients in an unselected academic biobank cohort. We developed an NLP system to extract prostate cancer Gleason scores from both clinical and pathology notes. Next, we designed and implemented the NLP-assisted extraction system algorithm to categorize notes into uncomplicated and complicated notes. Uncomplicated notes were assigned to NLP extraction and complicated notes were assigned to human extraction. We randomly reviewed 200 patients to assess the accuracy and speed of our NLP-assisted extraction system, and compared it to NLP extraction alone and human extraction alone.

Results:

Of the 2,051 patients in our cohort, the NLP system extracted a prostate surgery Gleason score from 1,147 (56%) patients and prostate biopsy Gleason score from 1,624 (79%) patients. Our NLP-assisted extraction system had an overall accuracy rate of 98.7%, which was similar to the accuracy of human extraction alone (97.5%, P = 0.17) and significantly higher than the accuracy of NLP extraction alone (95.3%, p < 0.01). Moreover, our NLP-assisted extraction system reduced the workload of human extractors by approximately 95%, resulting in an average extraction time of 12.7 seconds per patient (vs 256.1 seconds per patient for human extraction alone).

Conclusions:

We demonstrated that an NLP-assisted extraction system was able to achieve much faster Gleason score extraction compared to traditional human extraction without sacrificing accuracy.


 Citation

Please cite as:

Yu S, Le A, Feld E, Schriver E, Gabriel P, Doucette A, Narayan V, Feldman M, Schwartz L, Maxwell K, Mowery D

A Natural Language Processing–Assisted Extraction System for Gleason Scores: Development and Usability Study

JMIR Cancer 2021;7(3):e27970

DOI: 10.2196/27970

PMID: 34255641

PMCID: 8285739

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.