Previously submitted to: JMIR Public Health and Surveillance (no longer under consideration since Feb 02, 2024)
Date Submitted: Apr 18, 2023
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Standardizing the Human-Trust Driven, Multi-Sourcing Data Integration, and Deep Learning Approach to Predicting Health Threats: The Roles of Residential Environments and Sociodemographics in COVID-19’s Early Outbreaks and Epidemics
ABSTRACT
Background:
Predicting early outbreaks and epidemics with artificial intelligence (AI) needs a standardized approach where human trust is explicitly characterized to be maximized. Data science or analytical methods were found useful in predicting health threats. Still, there lacks a systemic thinking of integrating data science and analytical methods to be a transparent and explainable design that human trust is characterized.
Objective:
We aimed to formulate a standardized input-analytics-output (i.e., three components) approach to predicting health threats like COVID-19. The approach should be built on a theoretical framework that humans can trust cognitively and emotionally. At the same time, each component was designed to maximize the information load, performance, transparency, and human trust. We aimed to illustrate the standardized approach to predicting Hong Kong’s early outbreaks and the later epidemic of COVID-19 as an example.
Methods:
We adopted the socio-ecological system (SES) as the theoretical framework to map the accumulated number and time-series data of outbreaks in residential buildings and their influencing factors into a hierarchical structure. Multiple sources of non-health data were analyzed and integrated into the SES hierarchies from the individual, household, to community levels. Analytics included the modeling based on deep neural networks (DNN) of the SES-informed structure, the cross-validations for flagging buildings of more infection risks, and the prospective validations for forecasting 3-, 7- and 14-day daily case increases of all studied buildings.
Results:
We found that the role of sociodemographics and residential environments in the early COVID-19 outbreaks (before the epidemic: 2020-01-23 to 2021-12-23) differed from those in the epidemic (2021-12-24 to 2022-05-21), after studying 345 residential buildings from three adjacent districts. Our DNN accurately classified the high-risk cluster in the epidemic (AUCs=0.99, 0.98, and 0.95). Its extracted Shapley values enabled accurate building-level outbreak forecasting between 2022-05-22 and 2022-07-23 (average AUC>0.95 in 7-day forecasting horizon). Residential environment factors played more significant roles than sociodemographics in the COVID-19 epidemic. Based on the identified influencing factors (e.g., work hours, monthly household income, number of households, number of non-working population and children, and floor plan, number of floors, flats, and corridors), implications on monitoring the early outbreaks and preventing its evolvement into epidemics were discussed.
Conclusions:
We proposed a human-trust-driven AI design for predicting health threats and demonstrated its use and capability in flagging high-risk subjects with cross-sectional data and forecasting future outbreaks daily at the subject level. The standardized approach provided a systemic logic to design new AI and verify existing AI designs in predicting and understanding public-health threats like COVID-19. It also has excellent extensibility to integrate more data sources, flexibility to accommodate more complicated or uncomplicated modeling with low maintenance costs, and excellent generalizability to other medical research problems using SES or other theoretical frameworks.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.