Accepted for/Published in: JMIR Formative Research
Date Submitted: Jun 26, 2023
Date Accepted: Oct 8, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Artificial neural network for classifying schizophrenia cases using Japanese online survey data
ABSTRACT
Background:
In Japan, challenges were reported in accurately estimating the prevalence of schizophrenia among the general population. We confirmed that schizophrenia is associated with multiple factors based on data obtained from a large-scale online survey. Machine learning has shown positive impact on many fields, including epidemiology, due to its high-precision modeling capabilities.
Objective:
To construct a model that can accurately classify schizophrenia cases from large-scale Japanese online survey data and verify the generalizability of the model.
Methods:
Data were obtained from a large Japanese Internet research pooled panel (Rakuten Insight, Inc.) in 2021. A total of 223 subjects, aged 20–75 years, having schizophrenia and 1776 health controls were included. Answers to the questions in an online survey were formatted as one response variable (diagnosed with schizophrenia) and multiple feature variables (demographic, health-related backgrounds, physical comorbidities, psychiatric comorbidities, and social comorbidities). An artificial neural network (ANN) was applied to construct a model for classifying schizophrenia cases. Logistic regression (LR) was used as a reference. The performances of the models and algorithms were then compared.
Results:
The model trained by ANN performed better than LR in terms of area under the receiver operating characteristic curve (AUC, 0.86 vs 0.78), accuracy (0.93 vs 0.91), and specificity (0.96 vs 0.94), while the model trained by LR showed better sensitivity (0.63 vs 0.56). Comparing the performances of ANN and LR, ANN was better in terms of AUC ( bootstrapping: 0.847 vs. 0.773; cross-validation: 0.81 vs 0.72), while LR performed better in terms of accuracy (0.894 vs. 0.856). Sleep medication use, age, household income, and employment type were the top four variables in terms of importance.
Conclusions:
This study constructed an ANN model to classify schizophrenia cases using online survey data. High performance was achieved, which may provide evidence for future epidemiological studies on schizophrenia.
Citation