Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Feb 28, 2024
Open Peer Review Period: Mar 4, 2024 - Apr 29, 2024
Date Accepted: Oct 23, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study

Dorémus O, Russon D, Contrand B, Guerra-Adames A, Avalos-Fernandez M, Gil-Jardiné C, Lagarde E

Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study

JMIR AI 2025;4:e57828

DOI: 10.2196/57828

PMID: 40605780

PMCID: 12223680

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Harnessing Moderate-Sized Language Models for Reliable Patient Data De-identification in Emergency Department Records: An Evaluation of Strategies and Performance

  • Océane Dorémus; 
  • Dylan Russon; 
  • Benjamin Contrand; 
  • Ariel Guerra-Adames; 
  • Marta Avalos-Fernandez; 
  • Cédric Gil-Jardiné; 
  • Emmanuel Lagarde

ABSTRACT

Background:

The digitization of healthcare, facilitated by the adoption of electronic health record (EHR) systems, has revolutionized data-driven medical research and patient care. While this digital transformation offers substantial benefits in healthcare efficiency and accessibility, it concurrently raises significant concerns over privacy and data security. Initially, the journey towards protecting patient data de-identification saw the transition from rule-based systems to more mixed approaches including machine learning for de-identifying patient data. Subsequently, the emergence of Large Language Models (LLMs) has represented a further opportunity in this domain, offering unparalleled potential for enhancing the accuracy of context-sensitive de-identification. However, despite LLMs offering significant potential, the deployment of the most advanced models in hospital environments is frequently hindered by data security issues and the extensive hardware resources required.

Objective:

The objective of our study is to design, implement, and evaluate de-identification algorithms by employing fine-tuning of moderate-sized open-source language models, ensuring their suitability for production inference tasks on personal computers.

Methods:

We aimed at replacing personal identifying information (PII) with generic placeholders or labeling non-PII texts as 'ANONYMOUS', ensuring privacy while preserving textual integrity. Our dataset, derived from over 425,000 clinical notes from the adult emergency department of the Bordeaux University Hospital in France, underwent independent double annotation by two experts to create a reference for model validation with 3,000 clinical notes randomly selected. Three open-source language models of manageable size were selected for their feasibility in hospital settings: Llama 2 7B, Mistral 7B, and Mixtral 8x7B. Fine-tuning utilized the quantized Low-Rank Adaptation (qLoRA) technique. Evaluation focused on PII-level (Recall, Precision and F1-Score) and clinical note-level metrics (Recall and BLEU metric), assessing de-identification effectiveness and content preservation.

Results:

The generative model Mistral 7B demonstrated the highest performance with an overall F1-score of 0.9673 (vs. 0.8750 for Llama 2 and 0.8686 for Mistral 8x7B). At the clinical notes level, the same model achieved an overall recall of 0.9326 (vs. 0.6888 for Llama 2 and 0.6417 for Mistral 8x7B).This rate increased to 0.9915 for the anonymization of names with Mistral 7B. Four notes out of the total 3000 failed to be fully anonymized for names: in one case, the non-anonymized name belonged to a patient, while in the other cases, it belonged to medical staff. Beyond the fifth epoch, the BLEU score consistently exceeded 0.9864, indicating no significant text alteration due to the process.

Conclusions:

Our research underscores the significant capabilities of generative NLP models, with Mistral 7B standing out for its superior ability to de-identify clinical texts efficiently. Achieving notable performance metrics, Mistral 7B operates effectively without requiring high-end computational resources. These methods pave the way for a broader availability of anonymized clinical texts, enabling their use for research purposes and the optimization of the healthcare system.


 Citation

Please cite as:

Dorémus O, Russon D, Contrand B, Guerra-Adames A, Avalos-Fernandez M, Gil-Jardiné C, Lagarde E

Harnessing Moderate-Sized Language Models for Reliable Patient Data Deidentification in Emergency Department Records: Algorithm Development, Validation, and Implementation Study

JMIR AI 2025;4:e57828

DOI: 10.2196/57828

PMID: 40605780

PMCID: 12223680

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.