Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Apr 15, 2026
Open Peer Review Period: Apr 15, 2026 - Jun 10, 2026
(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).

Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.

Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).

Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.

Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.

Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Comparison of GPT-5, Gemini 2.5 Pro, and Human Coding for the Qualitative Analysis of Dutch Health Services Data: Comparative Study

  • Bram Van den Berkmortel; 
  • Daan Westra; 
  • Rachel Gifford; 
  • Frank Van de Baan

ABSTRACT

Background:

Large Language Models (LLMs) are increasingly used in qualitative research, but their reliability compared to human analysis, especially on large, non-English datasets, is unclear. Previous studies on older models (like GPT-4) show limitations in nuance and token capacity.

Objective:

This thesis compares the qualitative analysis capabilities of OpenAI's GPT-5 and Google's Gemini 2.5 with a traditional human analysis. The study uses a large dataset of 317 Dutch newspaper articles (860 pages) from January 1, 2020, to December 31st, 2023, investigating the sentiment towards nurses during the COVID-19 pandemic.

Methods:

The study employed a two-part methodology. First, a thematic comparison was conducted where the human researcher, GPT-5, and Gemini independently generated inductive coding trees from the entire corpus. Second, a comparative test was performed where all three coded a 10% (31/317) random sample using a predefined codebook. This process was iterative, requiring a second round of AI analysis with refined prompts and an article-by-article approach to ensure a valid comparison.

Results:

The results show that both AI models identified third-order themes (e.g., "Healthcare Heroes") that were highly consistent with the data. In the practical application, however, both AIs "over-coded", identifying more quotations than the human (approx. 180 vs. 136).

Conclusions:

This study reveals a fundamental divergence in analytical logic: whereas human coders prioritize interpretive significance (contextual weight), LLMs default to semantic presence (literal frequency), leading to systematic over-coding. Consequently, this article argues that LLMs should not be viewed as autonomous researchers but as high-sensitivity filtering instruments requiring human calibration. This study concludes that AI serves as a valuable assistant for qualitative researchers. Still, it requires a rigorous, iterative, and human-in-th


 Citation

Please cite as:

Van den Berkmortel B, Westra D, Gifford R, Van de Baan F

Comparison of GPT-5, Gemini 2.5 Pro, and Human Coding for the Qualitative Analysis of Dutch Health Services Data: Comparative Study

JMIR Preprints. 15/04/2026:98374

DOI: 10.2196/preprints.98374

URL: https://preprints.jmir.org/preprint/98374

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.