Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 17, 2022
Date Accepted: Nov 29, 2022

The final, peer-reviewed published version of this preprint can be found here:

A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System

Miyaji A, Watanabe K, Takano Y, Nakasho K, Nakamura S, Wang Y, Narimatsu H

A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System

JMIR Med Inform 2022;10(12):e38922

DOI: 10.2196/38922

PMID: 36583931

PMCID: 9840098

Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System

  • Atsuko Miyaji; 
  • Kaname Watanabe; 
  • Yuuki Takano; 
  • Kazuhisa Nakasho; 
  • Sho Nakamura; 
  • Yuntao Wang; 
  • Hiroto Narimatsu

ABSTRACT

Background:

By integrating data corresponding to individuals between databases managed by different institutions, big data useful for epidemiological research can be obtained. It is a requirement that privacy information is protected while performing efficient data matching at a high level.

Objective:

Privacy-Preserving Distributed Data Integration (PDDI) is a technology that enables data matching between multiple databases without moving privacy information. It is necessary to consider errors in matching keys; therefore, we conducted a basic matching experiment using a model to assess accuracy of cancer screening.

Methods:

We created a dataset that mimics the data of cancer screening and registration in Japan and conducted a matching experiment using a PDDI system between geographically distant institutions. Errors similar to those found empirically in data sets recorded in Japanese were artificially introduced into the dataset. The matching-key error rate of the data common to both datasets was set sufficiently higher than expected in the actual database: 85.0% and 59.0% for the data simulating colorectal and breast cancer, respectively. Various combinations of name, gender, date of birth, and address were used for the matching key. To evaluate the matching accuracy, the matching sensitivity and specificity were calculated based on the number of cancer screening data points, and the effect of the matching accuracy on the sensitivity and specificity of the cancer screening was estimated based on the obtained values. To evaluate the performance, we measured CPU usage, memory usage, and network traffic.

Results:

For combinations with a specificity of 99% or higher and high sensitivity, the date of birth and first name were used in the data simulating colorectal cancer, and the matching sensitivity and specificity were 55.00% and 99.85%, respectively. In the data simulating breast cancer, the date of birth and family name were used, and the matching sensitivity and specificity were 88.71% and 99.98%, respectively. Assuming the sensitivity and specificity of cancer screening at 90%, the apparent values decreased to 74.90% and 89.93%, respectively. A trial calculation was performed using a combination with the same data set and a specificity of 100%. When the matching sensitivity was 82.26%, the apparent screening sensitivity maintained at 90% and the screening specificity dropped to 89.89% with a small error from the original value. For 214 (16,384) datapoints, the execution time was 82 minutes and 26 seconds without parallelization and 11 minutes and 38 seconds with parallelization; 19.33% of the calculation time was for the data-holding institutions. Memory usage was 3.4 GB for the PDDI server and 2.7 GB for data-holding institutions.

Conclusions:

We demonstrated the rudimentary feasibility of introducing a PDDI system for cancer screening accuracy assessment. We plan to carry out matching experiments based on actual data and comparisons with existing methods.


 Citation

Please cite as:

Miyaji A, Watanabe K, Takano Y, Nakasho K, Nakamura S, Wang Y, Narimatsu H

A Privacy-Preserving Distributed Medical Data Integration Security System for Accuracy Assessment of Cancer Screening: Development Study of Novel Data Integration System

JMIR Med Inform 2022;10(12):e38922

DOI: 10.2196/38922

PMID: 36583931

PMCID: 9840098

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.