Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 27, 2025
Date Accepted: Oct 23, 2025

The final, peer-reviewed published version of this preprint can be found here:

Balancing Privacy and Utility in Child and Adolescent Mental Health Services Research: Retrospective Cohort Study on Synthetic Data Generation

Haizoune M, Leventhal B, Pant D, Nytrø Ã, Koochakpour k, Koposov RA, Øhlckers LR, Skokauskas N

Balancing Privacy and Utility in Child and Adolescent Mental Health Services Research: Retrospective Cohort Study on Synthetic Data Generation

JMIR Med Inform 2026;14:e71819

DOI: 10.2196/71819

PMID: 41747250

PMCID: 12982954

Balancing Privacy and Utility in Child and Adolescent Mental Health Services Research: A Retrospective Cohort Study on Synthetic Data Generation

  • Mounir Haizoune; 
  • Bennett Leventhal; 
  • Dipendra Pant; 
  • Øystein Nytrø; 
  • kaban Koochakpour; 
  • Roman A Koposov; 
  • Lars Ravn Øhlckers; 
  • Norbert Skokauskas

ABSTRACT

Background:

High-quality, large-scale healthcare research, especially those using medical records, encounters significant challenges related to technical difficulties and confidentiality issues. As a result, critical research questions about patient evaluation and treatment have been left unanswered. Moreover, the presence of stigma and increased sensitivity surrounding mental health issues have resulted in a significant delay in research progress, particularly concerning Child and Adolescent Mental Health Services (CAMHS).

Objective:

High-quality, large-scale healthcare research, especially those using medical records, encounters significant challenges related to technical difficulties and confidentiality issues. As a result, critical research questions about patient evaluation and treatment have been left unanswered. Moreover, the presence of stigma and increased sensitivity surrounding mental health issues have resulted in a significant delay in research progress, particularly concerning Child and Adolescent Mental Health Services (CAMHS).

Methods:

A CAMHS dataset from Stavanger University Hospital in Norway was divided into two cohorts: a training cohort (n = 6,184 referrals, 58,524 episodes of care) and an independent test set (n = 1,564 referrals, 14,610 episodes of care). A hierarchical synthetic data generation model was used to create synthetic referral periods and associated episodes of care based on real-world CAMHS data. The utility, quality, and privacy risk of the generated synthetic data were then evaluated and reported.

Results:

The study used a CAMHS cohort of 6,924 patients from Stavanger University Hospital, Norway. A synthetic hierarchical data generation model created reproducible synthetic CAMHS data with properties similar to real-world data (KS/TVD Complement score = 0.92, CS score = 0.77, CS (Inter-table) score = 0.75, CSS score = 0.92), while demonstrating low privacy risk (average Singleout score (univariate) = 0.17, multivariate = 0.04, Linkability risk = 2.5, inference risk = 0.7). The predictive model trained on synthetic data performed comparably to the model trained on real data for classifying the intensity of care required by patients, while maintaining feature interpretability (for n = 656, 1,546, 3,092, and 6,184, average PR_AUC = 0.32, 0.33, 0.34, and 0.40 respectively, compared to PR_AUC = 0.43 using 6,184 real data records.

Conclusions:

By offering access to extensive and representative samples with a low risk of patient identification, synthetic CAMHS data balances data utility with fairness and privacy protection. This approach not only encourages data sharing but also expands the breadth of research while safeguarding patient privacy. Additionally, it fosters innovation by providing researchers with high-quality data that can be used to develop new treatments and interventions. Furthermore, the use of synthetic data can help overcome barriers related to data access and regulatory constraints, making it easier for researchers to collaborate and share findings across institutions.


 Citation

Please cite as:

Haizoune M, Leventhal B, Pant D, Nytrø Ã, Koochakpour k, Koposov RA, Øhlckers LR, Skokauskas N

Balancing Privacy and Utility in Child and Adolescent Mental Health Services Research: Retrospective Cohort Study on Synthetic Data Generation

JMIR Med Inform 2026;14:e71819

DOI: 10.2196/71819

PMID: 41747250

PMCID: 12982954

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.