Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 8, 2020
Open Peer Review Period: Dec 8, 2020 - Feb 2, 2021
Date Accepted: Jan 8, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

Falissard L, Morgand C, Ghosn W, Imbaud C, Bounebache K, Rey G

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

JMIR Med Inform 2022;10(4):e26353

DOI: 10.2196/26353

PMID: 35404262

PMCID: 9039820

Neural translation and automated recognition of ICD-10 medical entities from natural language: Algorithm Development and Validation

  • Louis Falissard; 
  • Claire Morgand; 
  • Walid Ghosn; 
  • Claire Imbaud; 
  • Karim Bounebache; 
  • Grégoire Rey

ABSTRACT

Background:

The recognition of medical entities from natural language is an ubiquitous problem in the medical field, with applications ranging from medical act coding to the analysis of electronic health data for public health. It is however a complex task usually requiring human expert intervention, thus making it expansive and time consuming. The recent advances in artificial intelligence, specifically the raise of deep learning methods, has enabled computers to make efficient decisions on a number of complex problems, with the notable example of neural sequence models and their powerful applications in natural language processing. They however require a considerable amount of data to learn from, which is typically their main limiting factor. However, the CépiDc stores an exhaustive database of death certificates at the French national scale, amounting to several millions of natural language examples provided with their associated human coded medical entities available to the machine learning practitioner.

Objective:

This article investigates the applications of deep neural sequence models to the medical entity recognition from natural language problem.

Methods:

The investigated dataset is based on every French death certificate from 2011 to 2016, containing information such as the subject’s age, gender, and the chain of events leading to his or her death both in French and encoded as ICD-10 medical entities, for a total of around 3 million observations. The task of automatically recognizing ICD-10 medical entities from the French natural language based chain of event is then formulated as a type of predictive modelling problem known as a sequence-to-sequence modelling problem. A deep neural network based model known as the Transformer is then slightly adapted and fit to the dataset. Its performance is then assessed on an exterior dataset and compared to the current state of the art. Confidence intervals for derived measurements are derived via bootstrap.

Results:

The proposed approach resulted in a test F-measure of .952 [.946, .957], which constitutes a significant improvement on the current state of the art and its previously reported 82.5 F-measure assessed on a comparable dataset. Such an improvement opens a whole field of new applications, from nosologist level automated coding to temporal harmonization of death statistics.

Conclusions:

This article shows that deep artificial neural network can directly learn from voluminous datasets complex relationships between natural language and medical entities, without any explicit prior knowledge. Although not entirely free from mistakes, the derived model constitutes a powerful tool for automated coding of medical entities from medical language with promising potential applications.


 Citation

Please cite as:

Falissard L, Morgand C, Ghosn W, Imbaud C, Bounebache K, Rey G

Neural Translation and Automated Recognition of ICD-10 Medical Entities From Natural Language: Model Development and Performance Assessment

JMIR Med Inform 2022;10(4):e26353

DOI: 10.2196/26353

PMID: 35404262

PMCID: 9039820

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.