Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jul 30, 2024
Date Accepted: Aug 15, 2025

The final, peer-reviewed published version of this preprint can be found here:

Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study

Biscoe N, Leightley D, Murphy D

Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study

JMIR AI 2025;4:e64898

DOI: 10.2196/64898

PMID: 40982796

PMCID: 12501529

Developing a tool for identifying clinical risk from free text clinical records using natural language processing and machine learning

  • Natasha Biscoe; 
  • Daniel Leightley; 
  • Dominic Murphy

ABSTRACT

Background:

Electronic patient records (EPR) are an under-utilised yet valuable data source that has been extensively explored through research using natural language processing (NLP).

Objective:

This study applied NLP to create a risk identification tool capable of discerning high and low-risk veterans using EPR from a UK veteran mental health charity.

Methods:

A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categories risk into a binary outcome: 1 for high risk, and 0 for low risk.

Results:

The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. This comparison allowed for the calculation of the positive predictive value, negative predictive value (0.73, 95% CI [0.71 to 0.75]), sensitivity (0.75, 95% CI [0.74 to 0.76]), F1 score (0.74, 95% CI [0.72 to 0.76]), and accuracy, which was measured using the Youden Index (0.73, 95% CI [0.71 to 0.76]).

Conclusions:

The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk.


 Citation

Please cite as:

Biscoe N, Leightley D, Murphy D

Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study

JMIR AI 2025;4:e64898

DOI: 10.2196/64898

PMID: 40982796

PMCID: 12501529

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.