Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 1, 2022
Open Peer Review Period: Sep 1, 2022 - Oct 27, 2022
Date Accepted: Oct 22, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

Gérardin C, Mageau A, Mékinian A, Tannier X, Carrat F

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

JMIR Med Inform 2022;10(12):e42379

DOI: 10.2196/42379

PMID: 36534446

PMCID: 9808583

Construction of cohorts of similar patients from automatic extraction of medical concepts

  • Christel Gérardin; 
  • Arthur Mageau; 
  • Arsène Mékinian; 
  • Xavier Tannier; 
  • Fabrice Carrat

ABSTRACT

Background:

Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical records databases remains a challenge, especially in a language other than English.

Objective:

We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.

Methods:

Our multistep algorithm includes a named-entity recognition step, a multilabel classification using Medical Subject Headings ontology and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1-osteoporosis, P2-nephritis in systemic erythematosus lupus, P3-interstitial lung disease in systemic sclerosis, P4-lung infection, P5-obstetric antiphospholipid syndrome, and P6-Takayasu stroke. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with the precision-at-3, and the recall and average precision.

Results:

For P1-P4, the precision-at-3 ranged from 0.85 [0.75, 0.95] to 0.99 [0.98, 1], the recall ranged from 0.53[0.50, 0.55] to 0.83[0.81, 0.84], and the average precision ranged from 0.58 [0.54, 0.62] to 0.88 [0.85, 0.90], respectively. P5-P6 phenotypes could not be analysed due to a limited number of phenotypes.

Conclusions:

Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm to extract cohorts of similar patients.


 Citation

Please cite as:

Gérardin C, Mageau A, Mékinian A, Tannier X, Carrat F

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

JMIR Med Inform 2022;10(12):e42379

DOI: 10.2196/42379

PMID: 36534446

PMCID: 9808583

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.