Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Mar 19, 2024
Date Accepted: Oct 3, 2024

The final, peer-reviewed published version of this preprint can be found here:

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study

So Jh, Chang J, Kim E, Na J, Choi J, Sohn Jy, Kim BH, Chu SH

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study

JMIR Form Res 2024;8:e58418

DOI: 10.2196/58418

PMID: 39447159

PMCID: 11544339

Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization: Pilot Study

  • Jae-hee So; 
  • Joonhwan Chang; 
  • Eunji Kim; 
  • Junho Na; 
  • JiYeon Choi; 
  • Jy-yong Sohn; 
  • Byung-Hoon Kim; 
  • Sang Hui Chu

ABSTRACT

Background:

Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interviews, by analyzing counseling data from North Korean defectors with traumatic events and mental health issues.

Objective:

We investigate whether LLMs can (1) delineate the part of the conversation that suggests psychiatric symptoms and name the symptoms, and (2) summarize stressors and symptoms, based on the interview dialogue transcript.

Methods:

Given the interview transcripts, we align the LLMs to perform three tasks: (1) extracting stressors from the transcript, (2) delineating symptoms and their indicative sections from the transcript, and (3) writing the summary of patients given the extracted stressors and symptoms. These three tasks address the two objectives, where delineating symptoms involves output from the second task, and generating the summary of the interview involves the output from all three tasks. Here, the transcript data was labeled by mental health experts for training and evaluation of LLMs.

Results:

First, we provide the performances of LLMs on estimating (1) the transcript sections related with psychiatric symptoms, and (2) the name of the corresponding symptoms. We test the performance of LLMs on 102 transcript segments. For estimating the sections related with the symptoms, 74 out of 102 segments exhibit recall mid-token distance d≤0.2 in the zero-shot inference setting using GPT-4 Turbo model. For estimating the name of the corresponding symptoms, the fine-tuning method offers a performance advantage over the zero-shot inference setting in GPT-4 Turbo model. On average, the fine-tuning method yields an Accuracy of 0.817, Precision of 0.828, Recall of 0.818, and an F1-Measure of 0.821. Second, we utilize transcripts to generate summaries for each interviewee using LLMs. We employ metrics, such as G-Eval and BERTscore, estimated by LLMs. Summaries generated by GPT-4 Turbo model using both symptom and stressor information achieve high average G-Eval score: coherence of 4.66, consistency of 4.73, fluency of 2.16, and relevance of 4.67. Furthermore, it is observed that the use of the Retrieval-Augmented Generation (RAG) did not contribute to a significant performance improvement.

Conclusions:

LLMs with either (1) appropriate prompting techniques or (2) fine-tuning methods using the data labeled by mental health experts can achieve high performance on both the symptom delineation task and the summarization task. This research contributes to the nascent field of applying LLMs to psychiatric interview and demonstrates their potential effectiveness in aiding mental health practitioners.


 Citation

Please cite as:

So Jh, Chang J, Kim E, Na J, Choi J, Sohn Jy, Kim BH, Chu SH

Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study

JMIR Form Res 2024;8:e58418

DOI: 10.2196/58418

PMID: 39447159

PMCID: 11544339

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.