Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Education

Date Submitted: Jul 30, 2023
Open Peer Review Period: Jul 30, 2023 - Sep 24, 2023
Date Accepted: Nov 8, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project

Kuo NIH, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T, Lujic S, Hegarty J, Jorm L, Barbieri S

Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project

JMIR Med Educ 2024;10:e51388

DOI: 10.2196/51388

PMID: 38227356

PMCID: 10828942

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Enriching Data Science and Healthcare Education: Application and Impact of Synthetic Datasets through the Health Gym Project

  • Nicholas I-Hsien Kuo; 
  • Oscar Perez-Concha; 
  • Mark Hanly; 
  • Emmanuel Mnatzaganian; 
  • Brandon Hao; 
  • Marcus Di Sipio; 
  • Guolin Yu; 
  • Jash Vanjara; 
  • Ivy Cerelia Valerie; 
  • Juliana de Oliveira Costa; 
  • Tim Churches; 
  • Sanja Lujic; 
  • Jo Hegarty; 
  • Louisa Jorm; 
  • Sebastiano Barbieri

ABSTRACT

Large-scale medical datasets are vital for hands-on education in health data science, but are often inaccessible due to privacy concerns. Addressing this gap, we developed the Health Gym project, a free and open-source platform designed to generate synthetic health datasets applicable to various areas of data science education, including machine learning, data visualisation, and traditional statistical models. Initially, we generated three synthetic datasets for sepsis, acute hypotension, and antiretroviral therapy for human immunodeficiency virus. This paper discusses the educational applications of the Health Gym's synthetic datasets. We illustrate this through their use in postgraduate health data science courses delivered by the University of New South Wales (UNSW), Australia, and a Datathon event, involving academics, students, clinicians, and local health district professionals. We also include adaptable worked examples using our synthetic datasets, designed to enrich hands-on tutorial and workshop experiences. While we highlight the potential of these datasets in advancing data science education and healthcare AI, we also emphasise the need for continued research into the inherent limitations of synthetic data.


 Citation

Please cite as:

Kuo NIH, Perez-Concha O, Hanly M, Mnatzaganian E, Hao B, Di Sipio M, Yu G, Vanjara J, Valerie IC, de Oliveira Costa J, Churches T, Lujic S, Hegarty J, Jorm L, Barbieri S

Enriching Data Science and Health Care Education: Application and Impact of Synthetic Data Sets Through the Health Gym Project

JMIR Med Educ 2024;10:e51388

DOI: 10.2196/51388

PMID: 38227356

PMCID: 10828942

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.