Extracting Critical Information from Clinicians’ Notes: A Rule-Based Approach to Identify Severity of Dementia from Unstructured Data in Electronic Health Records
ABSTRACT
Background:
The severity of Alzheimer’s disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHR). Although this information is important for clinical monitoring and decision-making, it is often undocumented or “hidden” in unstructured text fields and not readily available for clinicians to act upon.
Objective:
This study assessed the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data.
Methods:
We used EHR data from a large academic healthcare system that included patients with a primary discharge diagnosis of ADRD based on ICD‐9/10 codes between 2014 and 2019. We first assessed the presence of ADRD severity information, followed by identifying the severity of ADRD in the EHR. Clinicians’ notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment, and (2) explicit terms for ADRD severity (e.g., “mild dementia,” “advanced AD”). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on prior literature and clinical expertise. Subsequently, we employed rule-based matching in Python 3.8 using spaCy and pandas libraries to identify the context in which specific words or phrases were mentioned. We evaluated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm.
Results:
The study included a total of 9,115 eligible patients with over 65,000 providers’ notes. Overall, 22.93% (N=2,090) of patients were documented with mild ADRD, 20.87% (N=1,902) were documented with moderate or severe ADRD, and 56.20% (N=5,123) did not have any documentation of the severity of their ADRD. For the task of determining the presence of ADRD severity, our algorithm achieved accuracy (> 95%), specificity (> 95%), sensitivity (> 90%), and F-1 score (> 83%). For the specific task of identifying severity of ADRD, our algorithm had a performance with accuracy (> 91%), specificity (> 80%), sensitivity (>88%), and F-1 score (>92%). Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to be older, more likely to be female, and Black, and received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural/urban residence.
Conclusions:
Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it’s essential to acknowledge potential biases arising from differences in documentation practices across various healthcare systems.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.