Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 15, 2021
Date Accepted: Feb 13, 2022

The final, peer-reviewed published version of this preprint can be found here:

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

El Emam K, Mosquera L, Fang X, El-Hussuna A

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Med Inform 2022;10(4):e35734

DOI: 10.2196/35734

PMID: 35389366

PMCID: 9030990

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: A Validation Study

  • Khaled El Emam; 
  • Lucy Mosquera; 
  • Xi Fang; 
  • Alaa El-Hussuna

ABSTRACT

Background:

A common task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare their utility. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general, and not for the SDG method comparison task.

Objective:

Evaluate the ability of common utility metrics to discriminate between SDG methods. The specific workload of interest is using the synthetic data for logistic regression binary prediction models.

Methods:

We evaluated six common utility metrics on 28 different health datasets, and two different SDG methods (a GAN and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic datasets from the same generative model. The metrics were then tested on their ability to select the SDG methods that would have AUROC values in synthetic data most similar to real data on logistic regression binary prediction models. The best performing metrics were then further applied on two separate colon cancer study datasets to narrow them down further in terms of ability to select the best SDG method.

Results:

The utility metrics best able to discriminate between the two SDG methods were the Hellinger distance and the propensity mean squared error (p=0.00006 and both significant at a Bonferroni adjusted alpha level of 0.05). On the validation datasets the propensity mean squared error metric demonstrated greater sensitivity.

Conclusions:

This study has validated a generative model utility metric, the propensity mean squared error, can be used to select among competing SDG methods on the same dataset.


 Citation

Please cite as:

El Emam K, Mosquera L, Fang X, El-Hussuna A

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Med Inform 2022;10(4):e35734

DOI: 10.2196/35734

PMID: 35389366

PMCID: 9030990

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.