Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: May 2, 2023
Open Peer Review Period: May 2, 2023 - Jun 27, 2023
Date Accepted: Jun 22, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Effective Privacy Protection Strategies for Pregnancy and Gestation Information From Electronic Medical Records: Retrospective Study in a National Health Care Data Network in China

Liu C, Jiao Y, Su L, Liu W, Zhang H, Nie S, Gong M

Effective Privacy Protection Strategies for Pregnancy and Gestation Information From Electronic Medical Records: Retrospective Study in a National Health Care Data Network in China

J Med Internet Res 2024;26:e46455

DOI: 10.2196/46455

PMID: 39163593

PMCID: 11372317

Extraction of Pregnancy and Gestation Information from Electronic Medical Records: Effective Privacy Protection Strategies in a National Healthcare Data Network in China

  • Chao Liu; 
  • Yuanshi Jiao; 
  • Licong Su; 
  • Wenna Liu; 
  • Haiping Zhang; 
  • Sheng Nie; 
  • Mengchun Gong

ABSTRACT

Background:

Pregnancy and gestation information is routinely recorded in the electronic medical records (EMR) systems in China in various datasets. The combination of the two data, i.e. times of pregnancy and times of gestation, implies the incident of abortion and other pregnancy-related issues, which is important for clinical decisions making and personal privacy protection. The distribution of this information inside EMR is variable, due to the inconsistent IT structures of EMR systems, and the quantitative evaluation of the potential exposure of this sensitive information has never been performed at a large scale.

Objective:

We aim to perform the first nationwide quantitative analysis on the identification sites and exposure frequency of sensitive pregnancy and gestation information to propose strategies for effective information extraction and privacy protection related to women’s health.

Methods:

The data extraction study was performed in a national healthcare data network. Rule-based protocols for pregnancy and gestation information extraction were developed by a committee of experts. Six different sub-datasets of EMRs are used as a schema for data analysis and strategy proposal. The identification sites and the frequency of identification in different sub-datasets were calculated. The manual quality inspection of extraction was then performed by two independent groups of reviewers on 1000 randomly selected records Based on the above statistics, strategies for effective information extraction and privacy protection were proposed.

Results:

The data network covers hospitalized patients from 19 hospitals in 9 provinces of China, with a total number of 7,084,339 and a time span of 10 years (2010~2020). 688,268 female patients with sensitive reproductive information (SRI) were identified. The frequencies of the identification were variable, with the marriage history in admission medical records at 62.74% as the highest part. Surprisingly, more than 50% of female patients were identified with pregnancy and gestation history in nursing records, which is not generally considered a sub-dataset rich in reproductive information. In the manual curation and review process, 500 cases were selected randomly. The precision and recall rate of information extraction method both exceeded 99.5%. The privacy-protection strategies were designed with clear technical directions.

Conclusions:

Critical information related to women’s health is recorded in a vast amount in Chinese routine EMR systems and it is distributed in different parts of the records with different frequencies, requiring a thorough protocol to extract and protect the information, which has been demonstrated technically feasible. Implementing a data-based strategy will help enforce the protection of women’s privacy and improve the accessibility of healthcare services.


 Citation

Please cite as:

Liu C, Jiao Y, Su L, Liu W, Zhang H, Nie S, Gong M

Effective Privacy Protection Strategies for Pregnancy and Gestation Information From Electronic Medical Records: Retrospective Study in a National Health Care Data Network in China

J Med Internet Res 2024;26:e46455

DOI: 10.2196/46455

PMID: 39163593

PMCID: 11372317

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.