Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Apr 19, 2025
Date Accepted: Aug 15, 2025
Automated Literature Screening for Hepatocellular Carcinoma Treatment: Integrating Three Large Language Models
ABSTRACT
Background:
Primary liver cancer (PLC), particularly hepatocellular carcinoma (HCC), poses significant clinical challenges due to late-stage diagnosis, tumor heterogeneity, and rapidly evolving therapeutic strategies. While systematic reviews and meta-analyses are essential for updating clinical guidelines, their labor-intensive nature limits timely evidence synthesis.
Objective:
This study proposes an automated literature screening workflow powered by large language models (LLMs) to accelerate evidence synthesis for HCC treatment guidelines.
Methods:
We developed a tripartite LLM framework integrating Doubao-1.5-pro-32k, Deepseek-v3, and Deepseek-R1-Distill-Qwen-7B to simulate collaborative decision-making for study inclusion and exclusion. The system was evaluated across nine reconstructed datasets derived from published HCC meta-analyses, with performance assessed using accuracy, agreement metrics (kappa and prevalence-adjusted bias-adjusted kappa [PABAK]), recall, precision, F1 scores, and computational efficiency parameters (processing time, cost).
Results:
The framework demonstrated good performance with a weighted accuracy of 0.96 and substantial agreement (PABAK = 0.91), achieving high weighted recall (0.90) but modest weighted precision (0.15) and F1 scores (0.22). Computational efficiency varied across datasets (processing time: 248–5,850 seconds; cost: 0.14–3.68 USD per dataset).
Conclusions:
This LLM-driven approach shows promise for accelerating evidence synthesis in HCC care by reducing screening time while maintaining methodological rigor. Key limitations related to clinical context sensitivity and error propagation highlight the need for reinforcement learning integration and domain-specific fine-tuning. LLM agent architectures with reinforcement learning offer a practical path for streamlining guideline updates, though further optimization is needed to improve specialization and reliability in complex clinical settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.