Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 25, 2024
Date Accepted: Jul 4, 2024
Large language models can match junior clinicians in discharge letter writing: a single-blinded study
ABSTRACT
Background:
Discharge letters are a critical component in continuity of care between specialists and primary care providers, but are time-consuming to write, under-prioritized in comparison to direct clinical care, and are often tasked to junior doctors. Prior studies assessing the quality of discharge summaries written for inpatient hospital admissions show inadequacies in many domains. Large language models such as GPT have the ability to summarize large volumes of unstructured free text, such as electronic medical records, and have the potential to automate such tasks, providing time savings and consistency in quality.
Objective:
To assess the performance of GPT-4 in generating discharge letters written from Urology specialist outpatient clinics to primary care providers, and compare their quality against letters written by junior clinicians.
Methods:
Fictional electronic records were written by physicians, simulating five common Urology outpatient cases with long-term follow-up. Records comprised simulated consultation notes, referral letters and replies, and relevant discharge summaries from inpatient admissions. GPT-4 was tasked to write discharge letters for these cases, with a specified target audience of primary care providers who would be continuing the patient’s care. Prompts were written for safety, content, and style. Concurrently, junior clinicians were provided with the same case records and instructional prompts. GPT-4 output was assessed by the study team for instances of hallucination. A blinded panel of primary care physicians then evaluated the letters using a standardized questionnaire tool.
Results:
GPT-4 outperformed human counterparts in information provision, but was less concise. GPT-4 had no instances of hallucination. There were no statistical differences in the clarity, collegiality, follow-up recommendations, and overall satisfaction between letters generated by humans and by GPT-4.
Conclusions:
Discharge letters written by GPT-4 had equivalent quality to those written by junior clinicians, without any hallucinations. This study demonstrates proof of concept that LLMs can be useful and safe tools in clinical documentation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.