Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Kuo NIH, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T, Lujic S, Hegarty J, Jorm L, Barbieri S
Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project
Enriching Data Science and Healthcare Education: Application and Impact of Synthetic Datasets through the Health Gym Project
Nicholas I-Hsien Kuo;
Oscar Perez-Concha;
Mark Hanly;
Emmanuel Mnatzaganian;
Brandon Hao;
Marcus Di Sipio;
Guolin Yu;
Jash Vanjara;
Ivy Cerelia Valerie;
Juliana de Oliveira Costa;
Tim Churches;
Sanja Lujic;
Jo Hegarty;
Louisa Jorm;
Sebastiano Barbieri
ABSTRACT
Large-scale medical datasets are vital for hands-on education in health data science, but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health datasets applicable to various areas of data science education, including machine learning, data visualisation, and traditional statistical models. Initially, we generated three synthetic datasets for sepsis, acute hypotension, and antiretroviral therapy for human immunodeficiency virus.
This paper discusses the educational applications of the Health Gym's synthetic datasets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales (UNSW), Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic datasets, designed to enrich hands-on tutorial and workshop experiences. While we highlight the potential of these datasets in advancing data science education and healthcare AI, we also emphasise the need for continued research into the inherent limitations of synthetic data.
Citation
Please cite as:
Kuo NIH, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T, Lujic S, Hegarty J, Jorm L, Barbieri S
Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project