Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 30, 2023
Open Peer Review Period: Jun 30, 2023 - Aug 25, 2023
Date Accepted: Apr 23, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

Bilotta I, Tonidandel S, Liaw WR, King E, Carvajal D, Taylor A, Thamby J, Xiang Y, Tao C, Hansen M

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

JMIR Med Inform 2024;12:e50428

DOI: 10.2196/50428

PMID: 38787295

PMCID: 11137426

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Examining Linguistic Differences in Electronic Health Records for Diverse Patients with Diabetes

  • Isabel Bilotta; 
  • Scott Tonidandel; 
  • Winston R. Liaw; 
  • Eden King; 
  • Diana Carvajal; 
  • Ayana Taylor; 
  • Julie Thamby; 
  • Yang Xiang; 
  • Cui Tao; 
  • Michael Hansen

ABSTRACT

Background:

Individuals from minoritized racial and ethnic backgrounds suffer from pernicious and pervasive health disparities that have emerged, in part, from clinician bias.

Objective:

We used a natural language processing approach examine to whether linguistic markers in electronic health record (EHR) notes differ, based on the race and ethnicity of the patient. To validate this approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.

Methods:

In this cross-sectional study, we extracted EHR notes for patients 18 years of age or older who were diagnosed with type 2 diabetes and received care from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics between 2006 and 2015. Race and ethnicity of patients were defined as ‘White Non-Hispanic,’ ‘Black Non-Hispanic,’ or ‘Hispanic/Latino’. We hypothesize that SEANCE (Sentiment Analysis and Social Cognition Engine) components (i.e., negative adjectives, positive adjectives, joy, fear and disgust, politics, respect, trust verbs, well-being) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which (on a scale of 1 to 10 with 10 being extremely indicative of bias) they thought variation in the use of SÉANCE language domains for different racial and ethnic groups were reflective of bias in EHR notes.

Results:

We examined EHR notes (n = 12,905) of Black Non-Hispanic, White Non-Hispanic, and Hispanic/Latino patients (n = 1,562), who were seen by 281 physicians. Twenty-seven clinicians participated in the validation study. Participants rated negative adjectives as 8.63 (SD=2.06), fear and disgust as 8.11 (SD=2.15), and positive adjectives as 7.93 (SD=2.46). Notes for Black Non-Hispanic patients contained significantly more negative adjectives (coeff=0.07, SE=0.02) and significantly more fear and disgust words (coeff=0.007, SE=0.002) compared to the notes for White Non-Hispanic patients. The notes for Hispanic/Latino patients included significantly fewer positive adjectives (coeff=-0.02, SE=0.007), trust verbs (coeff=-0.009, SE=0.004), and joy words (coeff=-0.03, SE=0.01) compared to the notes for White Non-Hispanic patients.

Conclusions:

If validated, this approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.


 Citation

Please cite as:

Bilotta I, Tonidandel S, Liaw WR, King E, Carvajal D, Taylor A, Thamby J, Xiang Y, Tao C, Hansen M

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis

JMIR Med Inform 2024;12:e50428

DOI: 10.2196/50428

PMID: 38787295

PMCID: 11137426

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.