JMIR Preprints #42379: Construction of cohorts of similar patients from automatic extraction of medical concepts

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)

Construction of cohorts of similar patients from automatic extraction of medical concepts

Christel Gérardin;
Arthur Mageau;
Arsène Mékinian;
Xavier Tannier;
Fabrice Carrat

ABSTRACT

Background:

Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical records databases remains a challenge, especially in a language other than English.

Objective:

We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.

Methods:

Our multistep algorithm includes a named-entity recognition step, a multilabel classification using Medical Subject Headings ontology and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1-osteoporosis, P2-nephritis in systemic erythematosus lupus, P3-interstitial lung disease in systemic sclerosis, P4-lung infection, P5-obstetric antiphospholipid syndrome, and P6-Takayasu stroke. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with the precision-at-3, and the recall and average precision.

Results:

For P1-P4, the precision-at-3 ranged from 0.85 [0.75, 0.95] to 0.99 [0.98, 1], the recall ranged from 0.53[0.50, 0.55] to 0.83[0.81, 0.84], and the average precision ranged from 0.58 [0.54, 0.62] to 0.88 [0.85, 0.90], respectively. P5-P6 phenotypes could not be analysed due to a limited number of phenotypes.

Conclusions:

Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm to extract cohorts of similar patients.

Citation

Please cite as:

Gérardin C, Mageau A, Mékinian A, Tannier X, Carrat F

Construction of Cohorts of Similar Patients From Automatic Extraction of Medical Concepts: Phenotype Extraction Study

JMIR Med Inform 2022;10(12):e42379

DOI: 10.2196/42379

PMID: 36534446

PMCID: 9808583

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Sep 1, 2022

Open Peer Review Period: Sep 1, 2022 - Oct 27, 2022

Date Accepted: Oct 22, 2022

(closed for review but you can still tweet)

Construction of cohorts of similar patients from automatic extraction of medical concepts

ABSTRACT

Citation

Copyright