Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Sep 1, 2022
Open Peer Review Period: Sep 1, 2022 - Oct 27, 2022
Date Accepted: Oct 22, 2022
(closed for review but you can still tweet)
Construction of cohorts of similar patients from automatic extraction of medical concepts
ABSTRACT
Background:
Reliable and interpretable automatic extraction of clinical phenotypes from large electronic medical records databases remains a challenge, especially in a language other than English.
Objective:
We aimed to provide an automated end-to-end extraction of cohorts of similar patients from electronic health records for systemic diseases.
Methods:
Our multistep algorithm includes a named-entity recognition step, a multilabel classification using Medical Subject Headings ontology and the computation of patient similarity. A selection of cohorts of similar patients on a priori annotated phenotypes was performed. Six phenotypes were selected for their clinical significance: P1-osteoporosis, P2-nephritis in systemic erythematosus lupus, P3-interstitial lung disease in systemic sclerosis, P4-lung infection, P5-obstetric antiphospholipid syndrome, and P6-Takayasu stroke. We used a training set of 151 clinical notes and an independent validation set of 256 clinical notes, with annotated phenotypes, both extracted from the Assistance Publique-Hôpitaux de Paris data warehouse. We evaluated the precision of the 3 patients closest to the index patient for each phenotype with the precision-at-3, and the recall and average precision.
Results:
For P1-P4, the precision-at-3 ranged from 0.85 [0.75, 0.95] to 0.99 [0.98, 1], the recall ranged from 0.53[0.50, 0.55] to 0.83[0.81, 0.84], and the average precision ranged from 0.58 [0.54, 0.62] to 0.88 [0.85, 0.90], respectively. P5-P6 phenotypes could not be analysed due to a limited number of phenotypes.
Conclusions:
Using a method close to clinical reasoning, we built a scalable and interpretable end-to-end algorithm to extract cohorts of similar patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.