JMIR Preprints #64898: Developing a tool for identifying clinical risk from free text clinical records using natural language processing and machine learning

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Developing a tool for identifying clinical risk from free text clinical records using natural language processing and machine learning

Natasha Biscoe;
Daniel Leightley;
Dominic Murphy

ABSTRACT

Background:

Electronic patient records (EPR) are an under-utilised yet valuable data source that has been extensively explored through research using natural language processing (NLP).

Objective:

This study applied NLP to create a risk identification tool capable of discerning high and low-risk veterans using EPR from a UK veteran mental health charity.

Methods:

A total of 20,342 notes were extracted for this purpose. To develop the risk tool, 70% of the records formed the training dataset, while the remaining 30% were allocated for testing and evaluation. The classification framework was devised and trained to categories risk into a binary outcome: 1 for high risk, and 0 for low risk.

Results:

The efficacy of each classifier model was assessed by comparing its results with those from clinical risk assessments. This comparison allowed for the calculation of the positive predictive value, negative predictive value (0.73, 95% CI [0.71 to 0.75]), sensitivity (0.75, 95% CI [0.74 to 0.76]), F1 score (0.74, 95% CI [0.72 to 0.76]), and accuracy, which was measured using the Youden Index (0.73, 95% CI [0.71 to 0.76]).

Conclusions:

The risk identification tool successfully determined the correct risk category of veterans from a large sample of clinical notes. Future studies should investigate whether this tool can detect more nuanced differences in risk.

Citation

Please cite as:

Biscoe N, Leightley D, Murphy D

Developing a Tool for Identifying Clinical Risk From Free-Text Clinical Records: Natural Language Processing Study

JMIR AI 2025;4:e64898

DOI: 10.2196/64898

PMID: 40982796

PMCID: 12501529

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Jul 30, 2024

Date Accepted: Aug 15, 2025

Developing a tool for identifying clinical risk from free text clinical records using natural language processing and machine learning

ABSTRACT

Citation

Copyright