Accepted for/Published in: JMIR Mental Health
Date Submitted: Aug 19, 2022
Date Accepted: Nov 20, 2022
Methodological and Quality Flaws in the Use of Artificial Intelligence in Mental Health Research: A Systematic Review
ABSTRACT
Background:
Artificial intelligence (AI) is fueling a revolution in medicine and healthcare. Mental health conditions are highly prevalent in many countries, and the COVID-19 pandemic has increased the risk of further erosion of the mental well-being in the population. Therefore, it is relevant to assess the current status of the application of AI towards mental health research to inform about trends, gaps, opportunities, and challenges.
Objective:
To perform a systematic overview of artificial intelligence applications in mental health in terms of methodologies, data, outcomes, performance, and quality.
Methods:
A systematic search in PubMed, Scopus, IEEE Xplore and Cochrane databases was conducted to collect records of use cases of artificial intelligence for mental health disorder studies from January 2016 to November 2021. Records were screened for eligibility if it was a practical implementation of artificial intelligence in clinical trials involving mental health conditions. Records of AI study cases were evaluated and categorized by the ICD-11 International Classification of Diseases. Data related to trial settings, collection methodology, features, outcomes and model development and evaluation was extracted following CHARMS guideline. Also, evaluation of risk of bias is provided.
Results:
Four hundred fifty-four non-duplicated records were retrieved and 129 were included – 24 of which were manually added. The incorporation of AI in mental health were found unbalanced between ICD-11 mental health disorders, where predominant categories were depressive disorders (N=70) and schizophrenia or other primary psychotic disorders (N=26). International collaboration is anecdotal (N=17) and data and developed models mostly remain private (N=126). The most popular trial designs were randomized controlled trials (N=62), followed by prospective cohorts (N=24) among observational studies. AI was typically applied to evaluate quality of treatments (N=44) or stratifying patients in subgroups and clusters(N=31). Models usually applied a combination of questionnaires and scales for symptom severity with EHR (N=49) as well as the use of medical images (N=33). Methodological flaws involve the statistical process of AI application and data preprocessing pipelines. 37% did not report any preprocessing (N=56). One-fifth of models were developed comparing several methods (N=35) without prior assessing their suitability and a small proportion reported any external validation (N=21).
Conclusions:
None updating, only one second validation report were found. Risk of bias and transparent reporting turned out to be discouraging, as hyperparameters or coefficients, or insights into the explainability of the models are rarely reported. These significant shortcomings may be indicative of overly accelerated promotion of new AI models without assessing their real-world viability.
Citation
Per the author's request the PDF is not available.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.