JMIR Preprints #50428: Examining Linguistic Differences in Electronic Health Records for Diverse Patients with Diabetes

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Examining Linguistic Differences in Electronic Health Records for Diverse Patients with Diabetes

Isabel Bilotta;
Scott Tonidandel;
Winston R. Liaw;
Eden King;
Diana Carvajal;
Ayana Taylor;
Julie Thamby;
Yang Xiang;
Cui Tao;
Michael Hansen

ABSTRACT

Background:

Individuals from minoritized racial and ethnic backgrounds suffer from pernicious and pervasive health disparities that have emerged, in part, from clinician bias.

Objective:

We used a natural language processing approach examine to whether linguistic markers in electronic health record (EHR) notes differ, based on the race and ethnicity of the patient. To validate this approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.

Methods:

In this cross-sectional study, we extracted EHR notes for patients 18 years of age or older who were diagnosed with type 2 diabetes and received care from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics between 2006 and 2015. Race and ethnicity of patients were defined as ‘White Non-Hispanic,’ ‘Black Non-Hispanic,’ or ‘Hispanic/Latino’. We hypothesize that SEANCE (Sentiment Analysis and Social Cognition Engine) components (i.e., negative adjectives, positive adjectives, joy, fear and disgust, politics, respect, trust verbs, well-being) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which (on a scale of 1 to 10 with 10 being extremely indicative of bias) they thought variation in the use of SÉANCE language domains for different racial and ethnic groups were reflective of bias in EHR notes.

Results:

We examined EHR notes (n = 12,905) of Black Non-Hispanic, White Non-Hispanic, and Hispanic/Latino patients (n = 1,562), who were seen by 281 physicians. Twenty-seven clinicians participated in the validation study. Participants rated negative adjectives as 8.63 (SD=2.06), fear and disgust as 8.11 (SD=2.15), and positive adjectives as 7.93 (SD=2.46). Notes for Black Non-Hispanic patients contained significantly more negative adjectives (coeff=0.07, SE=0.02) and significantly more fear and disgust words (coeff=0.007, SE=0.002) compared to the notes for White Non-Hispanic patients. The notes for Hispanic/Latino patients included significantly fewer positive adjectives (coeff=-0.02, SE=0.007), trust verbs (coeff=-0.009, SE=0.004), and joy words (coeff=-0.03, SE=0.01) compared to the notes for White Non-Hispanic patients.

Conclusions:

If validated, this approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.

Citation

Please cite as:

Bilotta I, Tonidandel S, Liaw WR, King E, Carvajal D, Taylor A, Thamby J, Xiang Y, Tao C, Hansen M

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

JMIR Med Inform 2024;12:e50428

DOI: 10.2196/50428

PMID: 38787295

PMCID: 11137426

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 30, 2023

Open Peer Review Period: Jun 30, 2023 - Aug 25, 2023

Date Accepted: Apr 23, 2024

(closed for review but you can still tweet)

Examining Linguistic Differences in Electronic Health Records for Diverse Patients with Diabetes

ABSTRACT

Citation

Copyright