Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Mental Health

Date Submitted: Feb 12, 2024
Date Accepted: May 23, 2024

The final, peer-reviewed published version of this preprint can be found here:

Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study

Adhikary PK, Srivastava A, Kumar S, Singh SM, Manuja P, Gopinath JK, Krishnan V, Kedia S, Deb KS, Chakraborty T

Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study

JMIR Ment Health 2024;11:e57306

DOI: 10.2196/57306

PMID: 39042893

PMCID: 11303879

Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: A Benchmark Study

  • Prottay Kumar Adhikary; 
  • Aseem Srivastava; 
  • Shivani Kumar; 
  • Salam Michael Singh; 
  • Puneet Manuja; 
  • Jini K Gopinath; 
  • Vijay Krishnan; 
  • Swati Kedia; 
  • Koushik Sinha Deb; 
  • Tanmoy Chakraborty

ABSTRACT

Background:

Comprehensive summaries of sessions enables an effective continuity in mental health counseling, facilitating informed therapy planning. Yet, manual summarization presents a significant challenge, diverting experts' attention from the core counseling process. Leveraging advancements in automatic summarization addresses this issue, offering mental health professionals accessibility and efficiency by streamlining the summarization of lengthy therapy sessions. However, existing approaches often overlook the nuanced intricacies inherent in counseling interactions.

Objective:

This study evaluates the effectiveness of state-of-the-art Large Language Models (LLMs) in selectively summarizing various components of therapy sessions through aspect-based summarization, aiming to benchmark their performance.

Methods:

We introduce MentalCLOUDS, a counseling-component guided summarization dataset. This benchmarking dataset consists of 191 counseling sessions with summaries focused on three distinct counseling components (aka counseling aspects). Additionally, we assess the capabilities of 11 state-of-the-art LLMs in addressing the task of component-guided summarization in counseling. The generated summaries are evaluated quantitatively using standard summarization metrics and verified qualitatively by mental health professionals.

Results:

Our findings demonstrate the superior performance of task-specific LLMs such as MentalLlama, Mistral, and MentalBART in terms of standard quantitative metrics such as Rouge-1, Rouge-2, Rouge-L, and BERTScore across all aspects of counseling components. Further, expert evaluation reveals that Mistral supersedes both MentalLlama and MentalBART based on six parameters — affective attitude, burden, ethicality, coherence, opportunity costs, and perceived effectiveness. However, these models share the same weakness by demonstrating a potential for improvement in the opportunity costs, and perceived effectiveness metrics.

Conclusions:

While LLMs fine-tuned specifically in the mental health domain exhibit better performance based on automatic evaluation scores, expert assessments indicate that these models are not yet reliable for clinical applications. Further refinement and validation are necessary before their implementation in practice.


 Citation

Please cite as:

Adhikary PK, Srivastava A, Kumar S, Singh SM, Manuja P, Gopinath JK, Krishnan V, Kedia S, Deb KS, Chakraborty T

Exploring the Efficacy of Large Language Models in Summarizing Mental Health Counseling Sessions: Benchmark Study

JMIR Ment Health 2024;11:e57306

DOI: 10.2196/57306

PMID: 39042893

PMCID: 11303879

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.