Accepted for/Published in: JMIR Bioinformatics and Biotechnology
Date Submitted: Dec 30, 2024
Date Accepted: Apr 27, 2025
Extracting Knowledge from Scientific Texts on Patient-Derived Cancer Models Using Large Language Models: Algorithm Development and Validation
ABSTRACT
Background:
Patient-derived cancer models (PDCMs) have emerged as indispensable tools in both cancer research and preclinical studies. The number of publications on PDCMs increased significantly in the last decade. Developments in Artificial Intelligence (AI), particularly Large Language Models (LLMs), hold promise for extracting knowledge from scientific texts at scale.
Objective:
The goal of this work is to research LLM-based systems to extract PDCM-related entities from scientific texts automatically.
Methods:
We explore direct prompting and soft prompting using LLMs. For direct prompting, we manually create prompts to guide the LLMs to output PDCM-related entities from texts. The prompt consists of an instruction, definitions of entity types, gold examples and a query. We automatically train soft prompts – a novel line of research in this domain -- as continuous vectors using machine learning approaches. We experiment with state-of-the-art LLMs – proprietary GPT4-o and a series of open LLaMA3 family models.
Results:
We annotated 100 abstracts of PDCM-relevant papers, focusing on papers about PDCMs for which metadata and data were deposited to the CancerModels.Org platform, resulting in 3,313 entity mentions for 15 entity types. We used 60 abstracts (2,089 entities) for training, 20 abstracts (542 entities) to refine the prompts, and 20 abstracts (682 entities) for the final evaluation. We evaluated the output for exact and overlapping span matching in two settings: (1) direct prompting where the prompts are manually created, and (2) soft prompting where the prompts are automatically learned continuous vectors. Results are reported as precision/positive predictive value, recall/sensitivity and F1 (harmonic mean of precision and recall). GPT4-o with direct prompting achieved 50.48 F1 and 71.36 F1 for exact and overlapping match evaluation settings, respectively. In both evaluation settings, we saw a performance improvement by applying soft prompting on LLaMA3 models. The F1 score of LLaMA3.2 3B with soft prompting increased from 7.06 to 46.68 in the exact match evaluation setting, and from 12.0 to 71.80 in the overlapping match evaluation setting, slightly higher than direct prompting GPT4-o.
Conclusions:
In this work, we applied recent advancements in LLMs to automatically extract PDCM-relevant entities from scientific texts. In our experiments, GPT4-o with direct prompts maintained competitive results. Soft prompting helped improve the performance of smaller open LLMs by a large margin. Our work shows that it is possible to achieve the performance of proprietary LLMs by training soft prompts with smaller open models. At a higher level, our study contributes to the growing body of research into understanding what tasks benefit from using LLMs as LLMs are likely not the perfect technology that could solve every single task.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.