Currently submitted to: JMIR AI
Date Submitted: Jun 2, 2026
Open Peer Review Period: Jun 3, 2026 - Jul 29, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Human-in-the-Loop Artificial Intelligence System for Systematic Literature Review: Methods and Validations for the AutoLit Review Software
ABSTRACT
Background:
While artificial intelligence (AI) tools have been utilized for individual stages within the systematic literature review (SLR) process, no tool has previously been shown to support each critical SLR step. In addition, expert oversight is required for responsible use of AI to ensure the quality of SLR findings.
Objective:
To describe a complete methodology for utilizing our AI SLR tool with human-in-the-loop curation workflows, as well as AI validations, time savings, and approaches to ensure compliance with best review practices.
Methods:
SLRs require completing search, screening, and extraction from relevant studies, with qualitative synthesis, meta-analysis, and critical appraisal as relevant. We present a full methodological framework for completing SLRs utilizing our AutoLit® software (Nested Knowledge). This system integrates AI models into the central steps in SLR: search strategy generation, dual screening of titles/abstracts and full texts, extraction of qualitative and quantitative evidence, critical appraisal, and insight drafting, with fully automated network meta-analysis. Validations comparing AI performance to experts are reported, and where relevant, time savings within the SLR workflow.
Results:
Across the SLR workflow, Nested Knowledge AI tools showed strongest evidence for search development, screening support, structured tagging, and extraction efficiency. Smart Search translated research questions into Boolean strings with 76.8% and 79.6% recall across two validation sets. For title/abstract screening, The Inclusion Prediction Model achieved 82%–97.1% recall, while criteria-based Smart Screener achieved approximately 95% accuracy, 80% time savings, and high recall in autonomous abstract/full-text screening. Core Smart Tags achieved F1=0.74 for PICO extraction and hierarchy building, with 74%, 78%, and 91% accuracy for study type, location, and study size, respectively. Adaptive Smart Tags demonstrated 68.4% expert overlap for version 1, while early version 2 evaluations showed 76% recall/69% precision overall and stronger performance for structured abstract-level concepts. Smart Meta-Analytical Extraction produced >95% time savings for quantitative extraction, and Smart Insights scored highest for citation integrity (4.9/5) and synthesis quality (4.4/5). Smart Critical Appraisal supports structured, framework-based appraisal, though validation evidence remains more limited.
Conclusions:
AI-enabled SLR tools with human-in-the-loop can reduce reviewer burden and accelerate evidence-review workflows without replacing expert judgment. Transparent methods, reproducible outputs, traceable evidence links, and human oversight are essential for ensuring that efficiency gains do not compromise review quality.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.