Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 31, 2024
Date Accepted: Jun 18, 2024
ChatGPT for Automated Qualitative Content Analysis: Intercoder Reliability
ABSTRACT
Background:
Analysis of web-based data can provide insights into any number of conditions including the mechanisms of behavior change through to attitudes towards treatment. However, data analysis approaches like qualitative content analysis are notoriously time and labor intensive because of the time to detect, assess and code a large amount of data. Tools such as ChatGPT may have tremendous potential in automating at least some of the analysis.
Objective:
The aim of this study was to explore the utility of ChatGPT in conducting qualitative content analysis through the analysis of forum posts from people sharing their experiences on reducing their sugar consumption.
Methods:
Inductive and deductive content analysis were performed on 537 forum posts to detect mechanisms of behavior change. Thorough prompt engineering provided appropriate instructions for ChatGPT to execute data analysis tasks. Data identification involved extracting change mechanisms from a subset of forum posts. Precision of the extracted data was assessed by comparison with human coding. Based on the identified change mechanisms, coding schemes were developed with ChatGPT using data-driven (inductive) and theory-driven (deductive) content analysis approaches. The deductive approach was informed by the Theoretical Domains Framework using both unconstrained coding scheme and structured coding matrix. Ten coding schemes were created from a subset of data and then applied to the full dataset in 10 new conversations resulting in 100 conversations each for inductive and unconstrained deductive analysis. Ten further conversations coded the full dataset into the structured coding matrix. Inter-coder agreement was evaluated across and within coding schemes. ChatGPT output was also evaluated by the researchers to assess whether it reflected prompt instructions.
Results:
The precision of detecting change mechanisms in the data subset ranged from 66% to 88%. Overall kappa-scores for inter-coder agreement ranged from 0.72-0.82 across inductive coding schemes and from 0.58-0.73 across unconstrained coding schemes and structured coding matrix. Coding into the best performing coding scheme resulted in category-specific kappa scores ranging from 0.67-0.95 for the inductive approach and 0.13-0.87 for the deductive approaches. ChatGPT largely followed prompt instructions in producing a description of each coding scheme although wording for the inductively developed coding schemes were lengthier than specified.
Conclusions:
ChatGPT appears fairly reliable in assisting with qualitative content analysis. ChatGPT performed better in developing an inductive coding scheme which emerged from the data over adapting an existing framework into an unconstrained coding scheme or coding directly into a structured matrix. Potential for ChatGPT to act as a second coder also appears promising with almost perfect agreement in at least one coding scheme. The findings suggest ChatGPT could prove useful as a tool to assist in each phase of qualitative content analysis, but multiple iterations are required to determine the reliability of each stage of analysis.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.