Currently submitted to: JMIR Bioinformatics and Biotechnology
Date Submitted: Jun 5, 2026
Open Peer Review Period: Jun 12, 2026 - Aug 7, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
ET-RAG: An Evidence-Temporal Retrieval-Augmented Generation Framework for Biomedical Literature Analysis
ABSTRACT
Background:
The biomedical literature is expanding at an unprecedented rate, with over 4,000 new articles indexed on PubMed each day. Clinicians and researchers frequently lack the time to review this volume before making decisions. Retrieval-Augmented Generation (RAG) systems attempt to bridge this gap by grounding language model responses in relevant documents, but standard implementations rank all retrieved passages solely by semantic similarity, treating a case report and a meta-analysis as equally authoritative.
Objective:
This study aimed to develop and pilot-evaluate an evidence- and temporally aware retrieval-augmented generation framework that integrates evidence quality and publication recency into retrieval scoring. Using Alzheimer’s disease literature as a test case, we assessed whether incorporating these signals improved biomedical question-answering quality relative to conventional cosine-similarity RAG and a full-context baseline.
Methods:
We developed ET-RAG (Evidence-Temporal Retrieval-Augmented Generation), a retrieval framework that ranks each retrieved text chunk using a weighted score integrating cosine similarity (50%), evidence quality based on the GRADE hierarchy (30%), and temporal recency (20%). We evaluated ET-RAG against two baselines: a full-context agent powered by Gemini 2.0 Flash and a standard cosine-similarity RAG agent using GPT-4o-mini. To assess performance, we constructed a benchmark of 40 questions derived from 10 peer-reviewed Alzheimer’s disease papers published between 2021 and 2025, including 10 single-choice, 10 multiple-choice, 10 short-answer, and 10 long-answer questions. Performance was evaluated using correctness for choice-based questions and completeness, accuracy, and relevance for open-ended questions, with scoring conducted using an LLM-as-a-judge framework.
Results:
ET-RAG achieved the highest scores across all four question categories: single choice (0.90), multiple choice (0.74), short answer (0.92), and long answer (0.89), with a combined average of 0.86. Cosine RAG scored 80%, 0.48, 0.82, and 0.69, respectively (average 0.70), while the full context agent scored 0.60, 0.59, 0.71, and 0.53 (average 0.61). The full context agent, despite having access to the entire corpus through Gemini’s large context window, struggled with consistent answer extraction and was prone to rate limiting under heavy query loads. A control question on forestry was correctly rejected by all three agents, suggesting no hallucination on this control item.
Conclusions:
In the Alzheimer’s disease benchmark, incorporating evidence quality and recency into RAG retrieval improved answer quality relative to pure cosine similarity retrieval and full-corpus prompting. The evidence-temporal scoring function is lightweight to implement and adds minimal computational overhead to existing vector search pipelines, but broader validation across domains, evidence levels, and stronger retrieval baselines are required before claims of generalizable biomedical reliability can be made.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.