Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 20, 2024
Date Accepted: Jun 30, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Efficient Detection of Stigmatizing Language in Electronic Health Records via In-Context Learning: A Comparative Analysis and Validation Study
ABSTRACT
Background:
The presence of stigmatizing language within Electronic Health Records (EHRs) poses significant risks to patient care by perpetuating biases, disrupting therapeutic relationships, and diminishing treatment adherence. Previous studies on detecting stigmatizing language have predominantly employed supervised machine learning techniques, which demand resource-intensive annotated datasets. In-context learning (ICL), an approach where a pre-trained large language model adapts to tasks based on provided instructions and examples, has emerged as a promising alternative to supervised learning, enabling effective task performance with minimal reliance on labeled data.
Objective:
This study aims to investigate the efficacy of ICL in detecting stigmatizing language within EHRs under data-scarce conditions.
Methods:
This study utilized a dataset comprising 5,043 EHR sentences from the emergency department at Beth Israel Deaconess Medical Center in Boston, Massachusetts, United States. The performance of the ICL approach was compared with established zero-shot and few-shot approaches, namely textual entailment and SetFit, as well as a supervised fine-tuning approach. The ICL approach employed four distinct prompting strategies: Generic, Chain of Thought, Clue and Reasoning Prompting, and a novel strategy we propose in this work termed the Stigma Detection Heuristic Prompt. We assessed the models’ fairness using the equality of opportunity criterion, focusing on true positive rate (TPR) disparities across protected attributes, including sex, age, and race. We reported the largest absolute TPR disparity for each demographic attribute across different models.
Results:
In the zero-shot setting, the best-performing ICL model, GEMMA-2 with the Stigma Detection Heuristic Prompt, achieved an F1 score of 0.856, a 19.4% improvement over the best textual entailment model, ROBERTA-M, which had an F1 score of 0.717. In the few-shot setting, the best-performing ICL model, LLAMA-3, with the same prompting strategy, showed F1 score improvements of 22.3%, 20.6%, and 12.6% over the leading SetFit models with 4, 8, and 16 annotations per class, respectively. Using only 32 labeled instances, the best ICL model achieved an F1 score of 0.902, 2.5% lower than the F1 score of 0.925 obtained by ROBERTA, the best model in the fully supervised fine-tuning approach trained on 3,543 labeled instances. The fairness evaluation revealed that supervised fine-tuning models exhibited greater bias, with the largest absolute TPR disparities of 0.062, 0.136, and 0.055 for sex-, age-, and race-differentiated subgroups, respectively, suggesting potential omission of 6.2%, 13.6%, and 5.5% of stigmatizing language for certain subgroups.
Conclusions:
The study demonstrates that the ICL approach effectively detects stigmatizing language within EHRs. ICL significantly outperforms established zero-shot and few-shot approaches. Introducing the Stigma Detection Heuristic Prompt further enhances ICL's detection capabilities. In summary, ICL emerges as a robust and flexible solution for detecting stigmatizing language in EHRs, offering a more data-efficient, effective, and equitable alternative than conventional machine learning approaches.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.