JMIR Preprints #35734: Utility Metrics for Evaluating Synthetic Health Data Generation Methods: A Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: A Validation Study

Khaled El Emam;
Lucy Mosquera;
Xi Fang;
Alaa El-Hussuna

ABSTRACT

Background:

A common task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare their utility. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general, and not for the SDG method comparison task.

Objective:

Evaluate the ability of common utility metrics to discriminate between SDG methods. The specific workload of interest is using the synthetic data for logistic regression binary prediction models.

Methods:

We evaluated six common utility metrics on 28 different health datasets, and two different SDG methods (a GAN and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic datasets from the same generative model. The metrics were then tested on their ability to select the SDG methods that would have AUROC values in synthetic data most similar to real data on logistic regression binary prediction models. The best performing metrics were then further applied on two separate colon cancer study datasets to narrow them down further in terms of ability to select the best SDG method.

Results:

The utility metrics best able to discriminate between the two SDG methods were the Hellinger distance and the propensity mean squared error (p=0.00006 and both significant at a Bonferroni adjusted alpha level of 0.05). On the validation datasets the propensity mean squared error metric demonstrated greater sensitivity.

Conclusions:

This study has validated a generative model utility metric, the propensity mean squared error, can be used to select among competing SDG methods on the same dataset.

Citation

Please cite as:

El Emam K, Mosquera L, Fang X, El-Hussuna A

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

JMIR Med Inform 2022;10(4):e35734

DOI: 10.2196/35734

PMID: 35389366

PMCID: 9030990

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 15, 2021

Date Accepted: Feb 13, 2022

Utility Metrics for Evaluating Synthetic Health Data Generation Methods: A Validation Study

ABSTRACT

Citation

Copyright