Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 15, 2025
Date Accepted: Feb 19, 2026
Causal Discovery in Observational Medical Research: Scoping Review
ABSTRACT
Background:
Observational data are fundamental to medical research but present formidable challenges for causal inference. Machine learning-based causal discovery algorithms have emerged as a promising solution to identify causal structures directly from such data. However, the current literature is skewed towards theoretical and methodological innovations, with a critical gap in systematic assessments of real-world performance and a lack of practical guidance for clinicians and researchers on selecting and applying these algorithms in specific medical contexts.
Objective:
This scoping review aimed to systematically map and synthesize the application of causal discovery methods in observational medical research, detailing the methodologies used, their application domains, the robustness of the findings, and the practical challenges encountered.
Methods:
Following the PRISMA-ScR guidelines, we conducted a systematic search of Scopus, Web of Science, PubMed, MEDLINE, Embase, and CINAHL from inception through May 2025. We included studies that applied any causal discovery algorithm to real-world observational medical data. Purely methodological papers and studies based solely on experimental data were excluded. Data were extracted and synthesized using a descriptive analysis focused on study characteristics, algorithm types, application domains, reported numerical results, and implementation challenges.
Results:
Out of 4844 identified publications, 72 (1.5%) met the inclusion criteria. Our synthesis revealed three key themes: 1) Methodological Landscape: Constraint-based algorithms were the most prevalent (52.8%, 38/72), with the FCI (13.9%, 10/72) and PC (12.5%, 9/72) algorithms being most common. Score-based (25.0%, 18/72) and hybrid (20.8%, 15/72) methods represented significant and growing segments. 2) Application Domains and Findings: The majority of studies (75.0%, 54/72) were in clinical research, with a strong focus on mental health (26.4%, 19/72; e.g., identifying symptom networks in schizophrenia and PTSD) and chronic diseases (26.4%, 19/72; e.g., elucidating progression pathways in Alzheimer's and diabetes). Etiological research was the primary objective (38.9%, 28/72). Public health applications (25.0%, 18/72) frequently assessed the causal impacts of behavioral interventions. 3) Implementation Challenges and Innovations: Common challenges included pervasive unmeasured confounding, limited sample sizes (noted in over 20% of studies), and reliance on unvalidated causal assumptions. Emerging innovations focused on longitudinal data frameworks and the integration of multimodal data sources to strengthen causal claims.
Conclusions:
This review underscores the growing application of causal discovery algorithms in medical research, while also highlighting challenges such as the lack of standardized validation frameworks and persistent confounding. Future efforts must focus on developing evaluation standards and fostering interdisciplinary collaboration to translate these powerful computational techniques into reliable tools for medical research and practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.