JMIR Preprints #65483: Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

Denis Moser;
Matthias Bender;
Murat Sariyar

ABSTRACT

Background:

: In healthcare settings, especially in high-pressure environments like Emergency situations, the ability to document and communicate patient information rapidly and accurately is crucial. Traditional methods for manual documentation are often time-consuming and prone to errors, which can adversely affect patient outcomes. To address these challenges, there is growing interest in integrating advanced technologies, especially Large Language Models (LLMs), into medical communication systems. However, deploying LLMs in clinical environments presents unique challenges, including the need to ensure the accuracy of medical content and to mitigate the risk of generating irrelevant or misleading information.

Objective:

This paper aims to address these challenges by developing a Natural Language Processing (NLP) pipeline for the extraction of text from German rescue services treatment dialogues. The objectives are twofold: (1) to generate realistic, medically relevant dialogues where the ground truth is known, and (2) to accurately extract essential information from these dialogues to populate emergency protocols.

Methods:

This study utilizes the MIMIC-IV-ED dataset, a de-identified, publicly available resource, to generate synthetic dialogue data for emergency department scenarios. By selecting and anonymizing data from 100 patients, we created a baseline for generating realistic dialogues and evaluating an NLP pipeline. We applied the Post Randomization Method (PRAM) for non-mechanical data perturbation, ensuring patient privacy and data utility. Dialogue generation was conducted in two stages: initial generation using the "Zephyr-7b-beta" model, followed by refinement and translation into German using GPT-4 Turbo. A Retrieval-Augmented Generation (RAG) approach was developed for extracting relevant information from these dialogues, involving chunking, embedding, and dynamic prompt templates. The model's performance was evaluated through manual review and sentiment analysis, ensuring that the generated dialogues maintained clinical relevance and emotional accuracy.

Results:

The data generation pipeline produced 100 dialogues, with initial English dialogues averaging 2,000 tokens and German dialogues 4,000 tokens. Manual evaluation identified certain redundancies and formal language in the German dialogues. Sentiment analysis revealed a reduction in negative sentiment from 67% to 59% and an increase in positive sentiment from 27% to 38%, which may negatively impact text extraction, as positive sentiments may not align well with identifying critical topics such as suicidal thoughts. The RAG-based extraction system achieved high precision and recall in both nominal and numerical features in the initial dialogues, with F1-scores ranging from 86.21% to 100%. However, performance declined in the refined dialogues, with notable drops in precision, particularly for "Diagnosis" (60.82%) and "Pain Score" (57.61%).

Conclusions:

The results of the study underscore the system's robust capabilities in processing structured data efficiently, demonstrating its strength in managing well-defined, quantitative information. However, the findings also reveal limitations in the system’s ability to handle nuanced clinical language, particularly when it comes to non-English and non-Chinese languages.

Citation

Please cite as:

Moser D, Bender M, Sariyar M

Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

JMIR Preprints. 16/08/2024:65483

DOI: 10.2196/preprints.65483

URL: https://preprints.jmir.org/preprint/65483

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Previously submitted to: JMIR Medical Informatics (no longer under consideration since Feb 04, 2025)

Date Submitted: Aug 16, 2024

Open Peer Review Period: Sep 5, 2024 - Oct 31, 2024

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Automating Emergency Medicine Documentation Using LLMs with Retrieval-Augmented Text Generation: Analytical Study

ABSTRACT

Citation

Copyright