Currently submitted to: Journal of Medical Internet Research
Date Submitted: Mar 8, 2026
Open Peer Review Period: Mar 9, 2026 - May 4, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Beyond Single Topics: Quantifying Information Loss by Comparing GPT-Based Aspect Sentiment Analysis With LDA in Hospital Reviews
ABSTRACT
Background:
Healthcare service quality is inherently multidimensional, yet document-level text analysis methods such as Latent Dirichlet Allocation (LDA) force patient reviews into single dominant topics. This simplification may systematically discard evaluative information when patients discuss multiple service dimensions with varying sentiments within the same review.
Objective:
This study compared document-level topic modeling (LDA) with GPT-based aspect-level sentiment analysis (ABSA) to address three research questions: (1) How much information is lost when collapsing multi-aspect reviews to single topics? (2) How prevalent are mixed-sentiment reviews, and what quality tensions do they reveal—both cross-aspect trade-offs and within-aspect ambivalence? (3) Do positive and negative reviews exhibit different structural patterns in aspect co-occurrence?
Methods:
We analyzed 2024 Google Reviews from 24 medical centers in Taiwan. Both LDA (K=7 topics) and GPT-based ABSA were applied to the same 5,467 reviews, ensuring fair comparison on identical data. The ABSA design employed structured prompts to extract aspects from seven predefined quality dimensions. Quality validation achieved Cohen κ=.82 against human annotation. Mixed-sentiment reviews were identified as those containing both positive and negative aspect evaluations, and cross-polarity couplings were analyzed to identify recurring trade-off patterns. Rating-stratified network analysis compared aspect co-occurrence patterns between positive reviews and negative reviews using Jaccard similarity.
Results:
Reviews discussed an average of 2.05 distinct aspects (SD=0.97), producing 51.2% information loss under LDA's single-topic assignment. Among multi-aspect reviews, 11.0% exhibited cross-aspect mixed sentiment, with Technical–Functional Divergence—praising Professional Quality while criticizing functional dimensions—appearing in 49.9% of these mixed-sentiment cases. Network analysis revealed differential bundling: operational dimensions co-occurred more strongly in negative reviews, whereas clinical dimensions co-occurred more strongly in positive reviews.
Conclusions:
Document-level topic modeling discards more than half of the evaluative information patients provide. Our findings reveal that patients cognitively decouple clinical competence from service delivery—Technical–Functional Divergence appeared in half of mixed-sentiment cases—and that positive and negative reviews organize quality dimensions differently. We recommend a complementary approach: topic modeling for exploratory discovery and ABSA for diagnostic assessment. For healthcare quality improvement, hospitals should separate clinical signals from operational signals in feedback dashboards.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.