Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Jan 23, 2025
Date Accepted: Apr 7, 2025
Evaluating ChatGPT in Qualitative Thematic Analysis: A Comparative Study with Human Researchers in the Japanese Clinical Context and Its Cultural Interpretation Challenges
ABSTRACT
Background:
Qualitative research is crucial for understanding the values and beliefs underlying individual experiences, emotions, and behaviors, particularly in social sciences and healthcare. Traditionally reliant on manual analysis by experienced researchers, this methodology requires significant time and effort. The advent of artificial intelligence (AI) technologies, especially large language models, such as ChatGPT, holds promise for enhancing qualitative data analysis. However, their development has primarily been based on English-language data, leaving their applicability to non-English languages such as Japanese insufficiently explored.
Objective:
This study aimed to evaluate the utility and limitations of ChatGPT-4 in analyzing Japanese interview data by comparing its performance to that of experienced human researchers.
Methods:
A comparative qualitative study was conducted to assess the performance of ChatGPT-4 and human researchers in analyzing transcribed Japanese semi-structured interviews. The analysis focused on agreement rates and the depth of interpretation, particularly for descriptive and culturally nuanced themes. This study analyzed transcripts from 30 semi-structured interviews conducted between February and March 2024 in an urban community hospital (Hospital A) and a rural university hospital (Hospital B) in Japan. Interviews centered on the theme of “sacred moments” and involved healthcare providers and patients. Transcripts were digitized using NVivo Ver. 14 and analyzed using ChatGPT-4 with iterative prompts for thematic analysis. The results were compared to those of a reflexive thematic analysis performed by human researchers. Furthermore, Charmaz’s Grounded Theory and Pope’s Five Step Framework approach were applied to evaluate the analytical flexibility and consistency of ChatGPT.
Results:
ChatGPT-4 demonstrated high agreement rates (> 80%) with human researchers for descriptive themes such as “personal experience of a sacred moment” and “building relationships.” However, its performance declined for culturally complex and emotionally nuanced themes, including “difficult to answer, no experience of sacred moments” and “fate,” with agreement rates decreasing to approximately 30%, highlighting its limitations in cultural and contextual interpretation.
Conclusions:
ChatGPT-4 shows promise in extracting descriptive themes from Japanese qualitative data, but exhibits significant limitations in interpreting cultural and emotional nuances. These findings highlight the potential of AI-assisted qualitative research, while underscoring the indispensable role of human researchers. Future research should evaluate the applicability of AI across diverse languages and cultural contexts, assess emerging AI models, and address ethical and legal considerations in AI-driven qualitative analyses. Clinical Trial: None
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.