Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jun 20, 2024
Date Accepted: Sep 25, 2024

The final, peer-reviewed published version of this preprint can be found here:

Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study

Ralevski A, Taiyab N, Nossal M, Mico L, Piekos S, Hadlock J

Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study

J Med Internet Res 2024;26:e63445

DOI: 10.2196/63445

PMID: 39561354

PMCID: 11615547

Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records

  • Alexandra Ralevski; 
  • Nadaa Taiyab; 
  • Michael Nossal; 
  • Lindsay Mico; 
  • Samantha Piekos; 
  • Jennifer Hadlock

ABSTRACT

Background:

Social Determinants of Health (SDoH) such as housing insecurity are known to be intricately linked to patients’ health status. Large language models (LLMs) developed from generative pre-trained transformers (GPTs) have shown potential for performing complex annotation tasks on unstructured clinical notes.

Objective:

Here we assess the performance of GPTs on identifying temporal aspects of housing insecurity, and compare results between both original and de-identified notes.

Methods:

We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx).

Results:

Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). GPT-4 precision improved slightly (0.936 original, 0.939 de-identified) on de-identified versions of the same notes, while recall dropped (0.781 original, 0.704 de-identified).

Conclusions:

This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs provide a scalable, cost-effective solution with the advantage of greater recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.


 Citation

Please cite as:

Ralevski A, Taiyab N, Nossal M, Mico L, Piekos S, Hadlock J

Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study

J Med Internet Res 2024;26:e63445

DOI: 10.2196/63445

PMID: 39561354

PMCID: 11615547

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.