Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jul 16, 2024
Open Peer Review Period: Jul 22, 2024 - Sep 16, 2024
Date Accepted: Jan 2, 2025
(closed for review but you can still tweet)
Retrieval Augmented Therapy Suggestion for Molecular Tumor Boards
ABSTRACT
Background:
Molecular Tumor Boards (MTBs) require intensive manual investigation to generate optimal treatment recommendations for patients. Large Language Models (LLMs) can catalyze MTB recommendations, decrease human error, improve accessibility to care, and enhance the efficiency of precision oncology.
Objective:
In this study, we investigate the efficacy of LLM-generated treatments for MTB patients. We specifically investigate the LLMs’ ability to generate evidence-based treatment recommendations using PubMed references.
Methods:
We built a Retrieval Augmented Generation (RAG) pipeline using PubMed data. We prompted the resulting LLM to generate treatment recommendations with PubMed references using a test set of patients from an MTB conference at a large comprehensive cancer center at a tertiary care institution. Members of the MTB manually assessed the relevancy and correctness of the generated responses.
Results:
75% of the referenced articles were properly cited from PubMed. 17% of the referenced articles were hallucinations, and the remaining were not properly cited from PubMed. Clinician-generated LLM queries achieved higher accuracy through clinician evaluation than automated queries, with clinicians labelling 25% of LLM responses as equal to their recommendations and 37.5% as alternative plausible treatments.
Conclusions:
This study demonstrates how RAG-enhanced LLMs can be a powerful tool in accelerating MTB conferences, as LLMs are sometimes capable of achieving clinician-equal treatment recommendations. However, further investigation is required to achieve stable results with zero hallucinations. LLMs signify a scalable solution to the time-intensive process of MTB investigations. However, LLM performance demonstrates that they must be used with heavy clinician supervision, and cannot yet fully automate the MTB pipeline.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.