Currently submitted to: JMIR Medical Informatics
Date Submitted: Mar 16, 2026
Open Peer Review Period: Apr 2, 2026 - May 28, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Title: Retrieval‑Augmented Generation Enhanced GPT‑4.1 to Support Clinical Trial Informed Consent Review for Data Reuse
ABSTRACT
Background:
Background:
Regulatory frameworks such as the Belmont Report, the Common Rule, and the Declaration of Helsinki require informed consent to ensure participants understand a study’s purpose and can make voluntary decisions about their involvement. Regulations including the General Data Protection Regulation (Regulation (EU) 2016/679) further emphasise that consent must be freely given and revocable without disadvantage. Although informed consent forms (ICFs) are intended to be clear and accessible, they have become increasingly lengthy and complex. Large language models (LLMs) offer potential to navigate and interpret this complexity and have shown promise in biomedical information extraction tasks. However, their susceptibility to hallucinations limits reliability in high stakes settings. Retrieval augmented generation (RAG) can mitigate such errors.
Objective:
This study evaluates the integration of LLMs with RAG for reviewing data reuse language in ICFs and their ability to interpret complex textual structures.
Methods:
Methods:
Firstly, we processed 438 ICFs from different trials, including multi-countries, languages and versions of ICFs. Using expertly curated prompts, we extracted information about data reuse using GPT-4.1. Comparing the LLM-generated data reuse outputs with human expert ground truth, we evaluated accuracy and the time required to extract information for each ICF. To further validate the workflow, we evaluated an independent set of 488 ICFs spanning additional trials, languages, and regions. For this cohort, we assessed the correctness of LLM outputs along with the quality of supporting evidence provided by the model.
Results:
Results:
Across 438 ICFs, the system achieved 81.6% accuracy, which increased to 90% in a subsequent evaluation of additional 488 ICFs after prompt optimisation. Using a RAG-based approach, the system accurately extracted data reuse information across multiple languages and identified nuanced international regulatory requirements.
Conclusions:
Conclusion: This approach has the potential to significantly alleviate administrative burdens by automating labour-intensive processes, while also generating insights that could inform the standardisation of ICF creation. Ultimately, these advancements may contribute to reduce the complexity of ICFs, thereby improving their readability and comprehensibility for participants. Clinical Trial: NA
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.