Currently submitted to: Journal of Medical Internet Research
Date Submitted: May 4, 2026
Open Peer Review Period: May 5, 2026 - Jun 30, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Evaluating the Feasibility of Using Agentic AI for Clinical Outcomes Research and Population Health Management Analyses with Large Administrative Databases: Generating Epidemiologic Estimates of Diseases
ABSTRACT
Background:
There is tremendous enthusiasm on the use of AI in health care because of the ability to analyze existing data for preventative, diagnostic, and treatment support. Agentic AI can make access to large real world datasets for the generation of real world evidence for health care and clinical applications feasible for health care providers, researchers, and administrators without access to large analytic programming resources.
Objective:
The objective of this study was to understand the feasibility of using agentic AI for clinical outcomes research and population health management. Specifically, this study used an agentic AI evidence generation platform to obtain epidemiologic estimates of several diverse medical conditions with the results evaluated against existing AI frameworks.
Methods:
Prevalence estimates of six diverse conditions (amyotrophic lateral sclerosis (ALS), acute myeloid leukemia (AML), bladder cancer, Huntington’s disease (HD), elevated lipoprotein (a) (LP(a)), and Parkinson’s disease (PD)) were estimated using an agentic AI evidence generation platform applied to an administrative claims database with representation from every US state. Gender-specific rates were calculated within the following age categories: 0-17, 18-24, 25-34, 35-44, 45-54, 55-64, 65-74, and 75 years and older. Period prevalence was estimated from Jan 1, 2020 – June 30, 2025, and annual prevalence rates for each year from 2020-2024. Continuous enrollment for 12 months was required during the study period for inclusion. Source code generated by the platform as part of the analysis was reviewed by an independent programmer for validation of methods and programming. Results obtained throughout the process were evaluated against several existing AI application frameworks.
Results:
Accuracy: Epidemiologic estimates obtain using the agentic AI platform were consistent with published estimates for all six conditions as well as with estimates obtained from traditional programming methods. Rigor: The agentic AI platform conducted the analysis with rigor by confirming acceptable methods in published literature for the type of data source used. Code lists used for the analysis were confirmed against existing algorithms when available. Appropriate statistical methods were used to compare differences in prevalence rates by age and gender. Trust (explainability, transparency, replicability, traceability, and validation): The agentic AI platform generated all source code used for the analyses, which was reviewed and validated for accuracy and appropriateness. The analysis included a ‘human-in-the-loop’ to validate the research question, data extraction method, statistical analysis plan, and output plan prior to proceeding with each step.
Conclusions:
With specific design aspects to ensure responsible use, agentic AI can be invaluable to making large datasets accessible for applied clinical outcomes research and population health management analyses.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.