Currently submitted to: JMIR Medical Informatics
Date Submitted: Feb 3, 2026
Open Peer Review Period: Feb 10, 2026 - Apr 7, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Building the Foundation: A Cloud-Native Infrastructure for Non-Medical Health Factors (NMHF) Data to Enable Reproducible Geospatial Analytics, Education, and Dashboards
ABSTRACT
Background:
Non-medical health factors (NMHF), including education, income, housing, transportation, and neighborhood infrastructure, are crucial to understanding health outcomes and health equity. However, integration of these factors into research and teaching has been challenged by fragmented data sources, heterogeneous data schemas, and inconsistent geographic units.
Objective:
To design and evaluate a cloud-native, geospatially standardized NMHF data infrastructure that supports end-to-end data acquisition, harmonization, analytics, and visualization for research and education.
Methods:
We implemented a serverless architecture on Google Cloud Platform, centered on BigQuery for scalable storage and geospatial analytics, while incorporating an improved Extract–Transform–Load (ETL) pipeline for data collection and storage. This cloud-native architecture also integrated Tableau for live interactive dashboards. Reproducible SQL pipelines standardize schemas and harmonize geographies via population-weighted crosswalks between ZIP Code Tabulation Areas (ZCTAs), census tracts, counties, and states. Users access the platform through parameterized SQL queries, Python notebooks, or optional serverless APIs. We evaluated the resulting data coverage, query performance, user adoption, and educational utility of the platform.
Results:
The platform harmonized data for over 40 NMHF databases across deprivation, vulnerability, opportunity, instability, demographics, and outcomes from widely used public sources at the census tract and ZCTA levels. Over 50 users, including students participating in courses, capstone projects, and workshops, actively engaged with the platform’s notebooks and dashboards. The publicly accessible dashboards accrued over 1,000 unique views. The platform demonstrated support for exploratory analyses linking NMHF indicators with health outcomes, illustrating its value for hypothesis generation and geospatial storytelling.
Conclusions:
This geospatially standardized, education-oriented NMHF infrastructure minimizes operational friction and shortens time-to-insight for students and researchers. It provides a pragmatic foundation for future efforts in clinical integration of social risk data, scalable federated analytics, and fairness-aware health modeling.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.