Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 9, 2023
Open Peer Review Period: Sep 9, 2023 - Nov 4, 2023
Date Accepted: Oct 20, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Large Language Models and Empathy: Systematic Review

Sorin V, Brin D, Barash Y, Konen E, Charney A, Nadkarni G, Klang E

Large Language Models and Empathy: Systematic Review

J Med Internet Res 2024;26:e52597

DOI: 10.2196/52597

PMID: 39661968

PMCID: 11669866

Large Language Models and Empathy: Systematic Review

  • Vera Sorin; 
  • Dana Brin; 
  • Yiftach Barash; 
  • Eli Konen; 
  • Alexander Charney; 
  • Girish Nadkarni; 
  • Eyal Klang

ABSTRACT

Background:

Empathy, a fundamental aspect of human interaction, is characterized as the ability to experience the emotions of another being within oneself. In the context of healthcare, empathy is a cornerstone of health care professionals and patients’ interaction. It is a unique quality to humans that Large Language Models (LLMs) are believed to lack.

Objective:

Our study aims to review the literature on the capacity of LLMs in demonstrating empathy.

Methods:

We conducted a literature search on MEDLINE, Google Scholar, PsyArxiv, medRxiv and arXiv between December 2022 and February 2024. Included were English language full-length publications that evaluated empathy in LLMs outputs. Excluded were papers evaluating other topics related to emotional intelligence that were not specifically empathy. The results of the included studies, including the LLMs used, performance in empathy tasks, limitations of the models, along with studies’ metadata were summarized.

Results:

Twelve studies, published in 2023, met the inclusion criteria. ChatGPT-3.5 by OpenAI was evaluated in all studies, with six comparing it to other LLMs such GPT-4, LLaMA and fine-tuned chatbots. Seven studies focused on empathy within a medical context. The studies reported LLMs to exhibit elements of empathy, including emotions recognition and emotional support in diverse contexts. In some cases, LLMs were observed to outperform humans in empathy-related tasks such as responding to patient questions from social media. Limitations were noted, including repetitive use of empathic phrases, difficulty following initial instructions, overly lengthy responses, sensitivity to prompts, and overall subjective evaluation metrics influenced by the evaluator’s background.

Conclusions:

LLMs exhibit elements of cognitive empathy, being able to recognize emotions and provide emotionally supportive responses in various contexts. Given that social skills are an integral part of intelligence, these advancements bring LLMs closer to human-like interactions and expand their potential use in applications requiring emotional intelligence. However, there remains room for improvement in both the performance of these models and the evaluation strategies used for assessing soft skills.


 Citation

Please cite as:

Sorin V, Brin D, Barash Y, Konen E, Charney A, Nadkarni G, Klang E

Large Language Models and Empathy: Systematic Review

J Med Internet Res 2024;26:e52597

DOI: 10.2196/52597

PMID: 39661968

PMCID: 11669866

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.