Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 29, 2024
Date Accepted: Jan 4, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Transforming Informed Consent Generation Using Large Language Models: Insights, Best Practices, and Lessons Learned for Clinical Trials
ABSTRACT
Background:
Informed consent forms (ICFs) for clinical trials have become increasingly complex, often hindering participant comprehension and engagement due to legal jargon and lengthy content. The recent advances in large language models (LLMs) present an opportunity to streamline the ICF creation process while improving readability, understandability, and actionability.
Objective:
This study aims to evaluate the performance of the Mistral 8x22B LLM in generating informed consent forms with improved readability, understandability, and actionability. Specifically, we evaluate the model's effectiveness in generating ICFs that are readable, understandable, and actionable while maintaining the accuracy and completeness.
Methods:
We processed four clinical trial protocols from the institutional review board (IRB) of UMass Chan Medical School using the Mistral 8x22B model to generate key information sections of ICFs. A multidisciplinary team of eight evaluators, including clinical researchers and health informaticians, assessed the generated ICFs against human-generated counterparts for completeness, accuracy, readability, understandability, and actionability. Readability, Understandability, and Actionability of Key Information (RUAKI) indicators, which include 18 binary-scored items, were employed to evaluate these aspects, with higher scores indicating greater accessibility, comprehensibility, and actionability of the information. Statistical analysis, including Wilcoxon rank-sum tests and Intraclass Correlation Coefficient (ICC) calculations, were employed to compare outputs.
Results:
LLM-generated ICFs demonstrated comparable performance to human-generated versions across key sections, with no significant differences in accuracy and completeness (P > .10). The LLM outperformed human-generated ICFs in readability (RUAKI score of 76.39% vs. 66.67%, Flesch-Kincaid Grade Level of 7.95 vs. 8.38) and understandability (90.63% vs. 67.19%, P = .02). The LLM-generated content achieved a perfect score in actionability compared with the human-generated version (100% vs. 0%, P < .001). ICC for evaluator consistency was high at 0.83 (95% CI [0.64, 1.03]), indicating good reliability across assessments.
Conclusions:
The Mistral 8x22B LLM significantly improves the readability, understandability, and actionability of ICFs without sacrificing accuracy or completeness. LLMs present a scalable, efficient solution for ICF generation, potentially enhancing participant comprehension and consent in clinical trials.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.