Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Mar 8, 2026
Open Peer Review Period: Mar 9, 2026 - May 4, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Beyond Single Topics: Quantifying Information Loss by Comparing GPT-Based Aspect Sentiment Analysis With LDA in Hospital Reviews

  • Jung-Tang Hsueh; 
  • Sheng-Hsun Hsu; 
  • Shwu-Fen Chiu

ABSTRACT

Background:

Healthcare service quality is inherently multidimensional, yet document-level text analysis methods such as Latent Dirichlet Allocation (LDA) force patient reviews into single dominant topics. This simplification may systematically discard evaluative information when patients discuss multiple service dimensions with varying sentiments within the same review.

Objective:

This study compared document-level topic modeling (LDA) with GPT-based aspect-level sentiment analysis (ABSA) to address three research questions: (1) How much information is lost when collapsing multi-aspect reviews to single topics? (2) How prevalent are mixed-sentiment reviews, and what quality tensions do they reveal—both cross-aspect trade-offs and within-aspect ambivalence? (3) Do positive and negative reviews exhibit different structural patterns in aspect co-occurrence?

Methods:

We analyzed 2024 Google Reviews from 24 medical centers in Taiwan. Both LDA (K=7 topics) and GPT-based ABSA were applied to the same 5,467 reviews, ensuring fair comparison on identical data. The ABSA design employed structured prompts to extract aspects from seven predefined quality dimensions. Quality validation achieved Cohen κ=.82 against human annotation. Mixed-sentiment reviews were identified as those containing both positive and negative aspect evaluations, and cross-polarity couplings were analyzed to identify recurring trade-off patterns. Rating-stratified network analysis compared aspect co-occurrence patterns between positive reviews and negative reviews using Jaccard similarity.

Results:

Reviews discussed an average of 2.05 distinct aspects (SD=0.97), producing 51.2% information loss under LDA's single-topic assignment. Among multi-aspect reviews, 11.0% exhibited cross-aspect mixed sentiment, with Technical–Functional Divergence—praising Professional Quality while criticizing functional dimensions—appearing in 49.9% of these mixed-sentiment cases. Network analysis revealed differential bundling: operational dimensions co-occurred more strongly in negative reviews, whereas clinical dimensions co-occurred more strongly in positive reviews.

Conclusions:

Document-level topic modeling discards more than half of the evaluative information patients provide. Our findings reveal that patients cognitively decouple clinical competence from service delivery—Technical–Functional Divergence appeared in half of mixed-sentiment cases—and that positive and negative reviews organize quality dimensions differently. We recommend a complementary approach: topic modeling for exploratory discovery and ABSA for diagnostic assessment. For healthcare quality improvement, hospitals should separate clinical signals from operational signals in feedback dashboards.


 Citation

Please cite as:

Hsueh JT, Hsu SH, Chiu SF

Beyond Single Topics: Quantifying Information Loss by Comparing GPT-Based Aspect Sentiment Analysis With LDA in Hospital Reviews

JMIR Preprints. 08/03/2026:92325

DOI: 10.2196/preprints.92325

URL: https://preprints.jmir.org/preprint/92325

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.