Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 9, 2024
Date Accepted: Jan 16, 2025

The final, peer-reviewed published version of this preprint can be found here:

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study

Hanna J, Wakene AD, Lehmann CU, Medford RJ

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study

J Med Internet Res 2025;27:e57257

DOI: 10.2196/57257

PMID: 40080818

PMCID: 11950697

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Assessing Racial and Ethnic Bias in Text Generation for Healthcare-Related Tasks by GPT-3.5-turbo: Cross Sectional Study

  • John Hanna; 
  • Abdi D Wakene; 
  • Christoph U Lehmann; 
  • Richard J Medford

ABSTRACT

Background:

Racial and ethnic bias in Large Language Models (LLMs) used for healthcare tasks is a growing concern, as it may contribute to health disparities. In response, LLM operators implemented safeguards against prompts that are overtly seeking certain bias.

Objective:

Our study investigates potential racial and ethnic bias in GPT-3.5-turbo, a popular LLM, in generating healthcare consumer-directed text in absence of overtly biased queries.

Methods:

In this cross-sectional study, GPT-3.5-turbo was prompted to generate discharge instructions for patients with Human Immunodeficiency Virus (HIV). Each patient’s encounter de-identified metadata including race/ethnicity as a variable were passed over in a table format through a prompt four times, altering only the race/ethnicity information (African American, Asian, Hispanic White, Non-Hispanic White) each time, while keeping all other information constant. The prompt requested the model to write discharge instructions for each encounter without explicitly mentioning race, ethnicity, or insurance type. The LLM-generated instructions were analyzed for sentiment, subjectivity, reading ease, and word usage by race/ethnicity and insurance type.

Results:

The average polarity of GPT-3.5-turbo generated patient instructions across the different racial/ethnic groups was comparable, ranging from 0.14 to 0.15, with an average subjectivity of 0.46 for all groups. Differences in polarity and subjectivity across racial/ethnic groups were not statistically significant. However, word frequency varied across racial/ethnic groups, and subjectivity differed across insurance types, with commercial insurance eliciting the most subjective responses.

Conclusions:

GPT-3.5-turbo was relatively invariant to race/ethnicity and insurance type in terms of linguistic and readability measures. Further studies are needed to validate these results and assess their implications.


 Citation

Please cite as:

Hanna J, Wakene AD, Lehmann CU, Medford RJ

Assessing Racial and Ethnic Bias in Text Generation by Large Language Models for Health Care–Related Tasks: Cross-Sectional Study

J Med Internet Res 2025;27:e57257

DOI: 10.2196/57257

PMID: 40080818

PMCID: 11950697

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.