Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 10, 2024
Date Accepted: Jun 4, 2025

The final, peer-reviewed published version of this preprint can be found here:

Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations

Kim MG, Hwang G, Jang J, Chang S, Roh HW, Park RW

Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations

J Med Internet Res 2025;27:e69857

DOI: 10.2196/69857

PMID: 40825309

PMCID: 12360790

Performance of Open-Source Large language Models in Psychiatry: A Comparative Analysis of Non-English Records and English Translations

  • Min-Gyu Kim; 
  • Gyubeom Hwang; 
  • Junhyuk Jang; 
  • Seheon Chang; 
  • Hyun Woong Roh; 
  • Rae Woong Park

ABSTRACT

Background:

Inequalities in access to psychiatric care remain a persistent issue. While large language models offer potential solutions, closed models like ChatGPT have limitations including privacy concerns. Open-source models have advantages such as enhanced data security and the ability to operate effectively in resource-limited settings. However, the effectiveness of open-source models in non-English psychiatric contexts also remains underexplored.

Objective:

We aimed to evaluate the feasibility of an open-source large language model in Korean and English for psychiatric application and to explore its potential to improve mental healthcare access in resource-limited settings for non-English speaking populations.

Methods:

The openbuddy-mistral-7b-v13.1 model, fine-tuned from Mistral 7B to enable conversational capabilities in Korean, was selected. A total of 200 psychiatric interview notes consisting of 50 cases each of schizophrenia, bipolar disorder, depressive disorder, and anxiety disorder were analyzed. The model generated English translations from the Korean interview notes. From both the original Korean notes and their English translations, the model was instructed to extract clinically meaningful clues and identify the possible diagnoses. Additionally, the model's performance on the psychiatry section of the Korean Medical Licensing Examination was evaluated using a similar approach.

Results:

The model generated 997 clues from Korean interview notes and 1,003 clues from English-translated notes. Hallucinations were more frequent with Korean input (30.2%) compared to English input (13.4%). Clinical reasoning was superior for English input, with 42.8% of clues showed diagnostic relevance, compared to 34.2% for Korean input. The top-1 diagnostic accuracy was also higher for English input (74.5%) compared to Korean input (59%). In the psychiatry section of the medical licensing examination, the model demonstrated better performance in English, achieving an accuracy of 46.1% compared to 32.2% in Korean.

Conclusions:

The findings of this study suggest that the performance of open-source LLMs in psychiatry may vary by language, especially in resource-limited settings. Addressing this issue may require collaborative efforts, such as the development of psychiatric datasets in the respective languages. Continuous efforts are necessary to create multilingual open-source LLMs capable of supporting psychiatric applications, thereby improving accessibility to mental healthcare.


 Citation

Please cite as:

Kim MG, Hwang G, Jang J, Chang S, Roh HW, Park RW

Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations

J Med Internet Res 2025;27:e69857

DOI: 10.2196/69857

PMID: 40825309

PMCID: 12360790

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.