Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 30, 2023
Open Peer Review Period: Jun 30, 2023 - Aug 25, 2023
Date Accepted: Apr 23, 2024
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Examining Linguistic Differences in Electronic Health Records for Diverse Patients with Diabetes
ABSTRACT
Background:
Individuals from minoritized racial and ethnic backgrounds suffer from pernicious and pervasive health disparities that have emerged, in part, from clinician bias.
Objective:
We used a natural language processing approach examine to whether linguistic markers in electronic health record (EHR) notes differ, based on the race and ethnicity of the patient. To validate this approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.
Methods:
In this cross-sectional study, we extracted EHR notes for patients 18 years of age or older who were diagnosed with type 2 diabetes and received care from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics between 2006 and 2015. Race and ethnicity of patients were defined as ‘White Non-Hispanic,’ ‘Black Non-Hispanic,’ or ‘Hispanic/Latino’. We hypothesize that SEANCE (Sentiment Analysis and Social Cognition Engine) components (i.e., negative adjectives, positive adjectives, joy, fear and disgust, politics, respect, trust verbs, well-being) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which (on a scale of 1 to 10 with 10 being extremely indicative of bias) they thought variation in the use of SÉANCE language domains for different racial and ethnic groups were reflective of bias in EHR notes.
Results:
We examined EHR notes (n = 12,905) of Black Non-Hispanic, White Non-Hispanic, and Hispanic/Latino patients (n = 1,562), who were seen by 281 physicians. Twenty-seven clinicians participated in the validation study. Participants rated negative adjectives as 8.63 (SD=2.06), fear and disgust as 8.11 (SD=2.15), and positive adjectives as 7.93 (SD=2.46). Notes for Black Non-Hispanic patients contained significantly more negative adjectives (coeff=0.07, SE=0.02) and significantly more fear and disgust words (coeff=0.007, SE=0.002) compared to the notes for White Non-Hispanic patients. The notes for Hispanic/Latino patients included significantly fewer positive adjectives (coeff=-0.02, SE=0.007), trust verbs (coeff=-0.009, SE=0.004), and joy words (coeff=-0.03, SE=0.01) compared to the notes for White Non-Hispanic patients.
Conclusions:
If validated, this approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.