JMIR Preprints #56243: Extraction of Substance Use Information from Clinical Notes: A GPT-Based Investigation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Extraction of Substance Use Information from Clinical Notes: A GPT-Based Investigation

Fatemeh Shah-Mohammadi;
Joseph Finkelstein

ABSTRACT

Background:

Understanding the intricate factors influencing patient well-being necessitates a focused examination of addiction status within the broader context of health determinants. The successful identification of patients' addiction profiles equips clinical care teams to address addiction-related issues more effectively, enabling targeted support and ultimately improving patient outcomes.

Objective:

This study investigates the application of the generative pre-trained transformer (GPT) model for extracting tobacco, alcohol, and substance addiction information from patient discharge summaries in zero-shot and few-shot learning settings. This study contributes to the evolving landscape of healthcare informatics by showcasing the potential of advanced language models in extracting nuanced information critical for enhancing patient care

Methods:

The main data source for analysis in this paper is Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Among all notes in this dataset, we focused on discharge summaries. Prompt engineering was undertaken, involving an iterative exploration of diverse prompts. Leveraging carefully curated examples and refined prompts, we investigate the model's proficiency through zero-shot as well as few-shot learning setting.

Results:

The presented results highlight the contrasting performance of GPT in extracting mentions of tobacco, alcohol, and substance use in both zero-shot and few-shot learning scenarios. In the zero-shot setting, the accuracy for extraction of tobacco, alcohol, and substance use mentions is notably high. However, in the few-shot setting, the accuracy diminishes significantly. On the contrary, few-shot learning led to significant increase in devising the status of addiction compared to zero-shot learning with significant increase in recall and F1-score. However, this improvement comes at the cost of a reduction in precision in both addiction mention and status extraction.

Conclusions:

Excellence of zero-shot learning in precisely extracting addiction mentions demonstrates its effectiveness in situations where comprehensive recall is paramount. Conversely, few-shot learning offers advantages when accurately determining the status of addiction is the primary focus, even if it involves a trade-off in precision.

Citation

Please cite as:

Shah-Mohammadi F, Finkelstein J

Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer–Based Investigation

JMIR Med Inform 2024;12:e56243

DOI: 10.2196/56243

PMID: 39037700

PMCID: 11369538

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 10, 2024

Date Accepted: Jul 18, 2024

Date Submitted to PubMed: Jul 22, 2024

Extraction of Substance Use Information from Clinical Notes: A GPT-Based Investigation

ABSTRACT

Citation

Copyright