Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Apr 19, 2025
Date Accepted: Aug 15, 2025

The final, peer-reviewed published version of this preprint can be found here:

Automated Literature Screening for Hepatocellular Carcinoma Treatment Through Integration of 3 Large Language Models: Methodological Study

Pan C, Lu W, Chen B, Zhang G, Yang Z, Hao J

Automated Literature Screening for Hepatocellular Carcinoma Treatment Through Integration of 3 Large Language Models: Methodological Study

JMIR Med Inform 2025;13:e76252

DOI: 10.2196/76252

PMID: 40921065

PMCID: 12455167

Automated Literature Screening for Hepatocellular Carcinoma Treatment: Integrating Three Large Language Models

  • Chen Pan; 
  • Wei Lu; 
  • Bingliang Chen; 
  • Gang Zhang; 
  • Zhiming Yang; 
  • Jingcheng Hao

ABSTRACT

Background:

Primary liver cancer (PLC), particularly hepatocellular carcinoma (HCC), poses significant clinical challenges due to late-stage diagnosis, tumor heterogeneity, and rapidly evolving therapeutic strategies. While systematic reviews and meta-analyses are essential for updating clinical guidelines, their labor-intensive nature limits timely evidence synthesis.

Objective:

This study proposes an automated literature screening workflow powered by large language models (LLMs) to accelerate evidence synthesis for HCC treatment guidelines.

Methods:

We developed a tripartite LLM framework integrating Doubao-1.5-pro-32k, Deepseek-v3, and Deepseek-R1-Distill-Qwen-7B to simulate collaborative decision-making for study inclusion and exclusion. The system was evaluated across nine reconstructed datasets derived from published HCC meta-analyses, with performance assessed using accuracy, agreement metrics (kappa and prevalence-adjusted bias-adjusted kappa [PABAK]), recall, precision, F1 scores, and computational efficiency parameters (processing time, cost).

Results:

The framework demonstrated good performance with a weighted accuracy of 0.96 and substantial agreement (PABAK = 0.91), achieving high weighted recall (0.90) but modest weighted precision (0.15) and F1 scores (0.22). Computational efficiency varied across datasets (processing time: 248–5,850 seconds; cost: 0.14–3.68 USD per dataset).

Conclusions:

This LLM-driven approach shows promise for accelerating evidence synthesis in HCC care by reducing screening time while maintaining methodological rigor. Key limitations related to clinical context sensitivity and error propagation highlight the need for reinforcement learning integration and domain-specific fine-tuning. LLM agent architectures with reinforcement learning offer a practical path for streamlining guideline updates, though further optimization is needed to improve specialization and reliability in complex clinical settings.


 Citation

Please cite as:

Pan C, Lu W, Chen B, Zhang G, Yang Z, Hao J

Automated Literature Screening for Hepatocellular Carcinoma Treatment Through Integration of 3 Large Language Models: Methodological Study

JMIR Med Inform 2025;13:e76252

DOI: 10.2196/76252

PMID: 40921065

PMCID: 12455167

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.