Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jun 20, 2024
Date Accepted: Sep 25, 2024
Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records
ABSTRACT
Background:
Social Determinants of Health (SDoH) such as housing insecurity are known to be intricately linked to patients’ health status. Large language models (LLMs) developed from generative pre-trained transformers (GPTs) have shown potential for performing complex annotation tasks on unstructured clinical notes.
Objective:
Here we assess the performance of GPTs on identifying temporal aspects of housing insecurity, and compare results between both original and de-identified notes.
Methods:
We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx).
Results:
Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). GPT-4 precision improved slightly (0.936 original, 0.939 de-identified) on de-identified versions of the same notes, while recall dropped (0.781 original, 0.704 de-identified).
Conclusions:
This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs provide a scalable, cost-effective solution with the advantage of greater recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.