JMIR Preprints #95417: Title: Retrieval‑Augmented Generation Enhanced GPT‑4.1 to Support Clinical Trial Informed Consent Review for Data Reuse

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Title: Retrieval‑Augmented Generation Enhanced GPT‑4.1 to Support Clinical Trial Informed Consent Review for Data Reuse

Lameck Mbangula Amugongo;
Lena Schaller;
Helene Wendt;
R. Maarten van Dijk;
Enrica Zanuttigh;
Claudia Neumann;
Andreas Freisinger;
Jaroslaw Deska

ABSTRACT

Background:

Regulatory frameworks such as the Belmont Report, the Common Rule, and the Declaration of Helsinki require informed consent to ensure participants understand a study’s purpose and can make voluntary decisions about their involvement. Regulations including the General Data Protection Regulation (Regulation (EU) 2016/679) further emphasise that consent must be freely given and revocable without disadvantage. Although informed consent forms (ICFs) are intended to be clear and accessible, they have become increasingly lengthy and complex. Large language models (LLMs) offer potential to navigate and interpret this complexity and have shown promise in biomedical information extraction tasks. However, their susceptibility to hallucinations limits reliability in high stakes settings. Retrieval augmented generation (RAG) can mitigate such errors.

Objective:

This study evaluates the integration of LLMs with RAG for reviewing data reuse language in ICFs and their ability to interpret complex textual structures.

Methods:

Firstly, we processed 438 ICFs from different trials, including multi-countries, languages and versions of ICFs. Using expertly curated prompts, we extracted information about data reuse using GPT-4.1. Comparing the LLM-generated data reuse outputs with human expert ground truth, we evaluated accuracy and the time required to extract information for each ICF. To further validate the workflow, we evaluated an independent set of 488 ICFs spanning additional trials, languages, and regions. For this cohort, we assessed the correctness of LLM outputs along with the quality of supporting evidence provided by the model.

Results:

Across 438 ICFs, the system achieved 81.6% accuracy, which increased to 90% in a subsequent evaluation of additional 488 ICFs after prompt optimisation. Using a RAG-based approach, the system accurately extracted data reuse information across multiple languages and identified nuanced international regulatory requirements.

Conclusions:

Conclusion: This approach has the potential to significantly alleviate administrative burdens by automating labour-intensive processes, while also generating insights that could inform the standardisation of ICF creation. Ultimately, these advancements may contribute to reduce the complexity of ICFs, thereby improving their readability and comprehensibility for participants. Clinical Trial: NA

Citation

Please cite as:

Amugongo LM, Schaller L, Wendt H, van Dijk RM, Zanuttigh E, Neumann C, Freisinger A, Deska J

Title: Retrieval‑Augmented Generation Enhanced GPT‑4.1 to Support Clinical Trial Informed Consent Review for Data Reuse

JMIR Preprints. 16/03/2026:95417

DOI: 10.2196/preprints.95417

URL: https://preprints.jmir.org/preprint/95417

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: JMIR Medical Informatics

Date Submitted: Mar 16, 2026

Open Peer Review Period: Apr 2, 2026 - May 28, 2026

(currently open for review)

Title: Retrieval‑Augmented Generation Enhanced GPT‑4.1 to Support Clinical Trial Informed Consent Review for Data Reuse

ABSTRACT

Citation

Copyright