Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 12, 2024
Open Peer Review Period: Nov 25, 2024 - Jan 20, 2025
Date Accepted: Jan 25, 2025
(closed for review but you can still tweet)
Improving phenotyping of patients with immune-mediated inflammatory diseases through automated processing of discharge summaries: a multicenter cohort study
ABSTRACT
Background:
Valuable insights gathered by clinicians during their inquiries and documented in textual reports are often unavailable in the structured data recorded in the electronic health records (EHRs).
Objective:
This work highlights that mining unstructured textual data with natural language processing (NLP) techniques complements the available structured data and enables a more comprehensive patient phenotyping.
Methods:
We collected EHRs available in the clinical data warehouse of the Greater Paris University Hospitals from 2012 to 2021 for patients hospitalized and diagnosed with one of four immune-mediated inflammatory diseases: systemic lupus erythematosus (SLE), systemic sclerosis, antiphospholipid syndrome, and Takayasu's arteritis. Then, we built, trained, and validated NLP algorithms on 103 discharge summaries selected from the cohort and annotated by a clinician. Finally, all discharge summaries in the cohort were processed with the algorithms and the extracted data were compared with the structured data.
Results:
Named entity recognition followed by normalization yielded f1-scores with 95% confidence intervals of 71.1 (63.6-77.8) for the laboratory tests and 89.3 (85.9-91.6) for the drugs. Application of the algorithms to 18,604 EHRs increased the detection of antibody results and drug treatments (e.g., +53.6% of the patients in the SLE cohort with positive antinuclear antibodies), making the results more consistent with the literature.
Conclusions:
While challenges remain in standardizing laboratory tests, particularly with abbreviations, this work, based on secondary use of clinical data, demonstrates that automatic processing of discharge summaries enriched the information available in structured data and facilitated more comprehensive patient profiling.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.