Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 20, 2025
Date Accepted: Apr 2, 2026
Thematic Mapping and Evolution of Social Media Mining in Health Research: An Hybrid Bibliometric Synthesis
ABSTRACT
Background:
Social media platforms provide a vast source of data, as they are used by billions of people worldwide. Mining social media data enables real-time monitoring of user-reported health information and serves as supplement to tradition health data analytics. Yet the rapid proliferation of this literature has produced fragmentation. Thus, it lacks comprehensive knowledge map about Social Media Mining.
Objective:
This study pursues two goals: (1) to outline key thematic clusters in health-related social media mining and map their dynamic evolution; (2) methodologically to demonstrate how machine learning based bibliometric analysis can strengthen the robustness, transparency and foresight capacity of evidence synthesis.
Methods:
This study designed a fully automated, reproducible bibliometric analysis method. First, we retrieved 250 PubMed publications from 2015 to 2025 and analysed 189 records with abstracts and keywords. Then we performed cleaning and standardization on titles, abstracts, author keywords, and MeSH terms. An exploratory descriptive analysis was conducted to get preliminary insights into publication patterns, including publication trends, countries, collaborative networks and keywords statistic. Subsequently, we processed SPECTER2 and PubMedBERT embeddings with keywords and abstracts to construct a hybrid similarity matrix. Based on this, we employed the UMAP–HDBSCAN algorithm for thematic clustering and visualized the results in a three-dimensional strategic coordinate system (maturity, influence, novelty) to identify hotspots and emerging frontiers. Additionally, we combined time-slice analysis to track thematic evolution trajectories. To ensure robustness, we implemented multi-level validation: internal consistency metrics, external citation metrics, and micro-evidence mapping based on representative literature.
Results:
We identified six thematic clusters: Cluster 1 - Emerging Peripheral and Heterogeneous Research, Cluster 2 - Computational Methods in Health Informatics, Cluster 3 - Public Attitudes and Socio-Psychological Determinants, Cluster 4 - Infodemiology and COVID-19 Information Ecosystems, Cluster 5 - Health Communication and Public Health Engagement and Cluster 6 - Social Media Analytics and Network Methods. Strategic 3D mapping revealed that methodological clusters (Clusters 2 and 6) occupy high-maturity and high-influence positions, while application-driven themes (Clusters 3 and 4) concentrate in high-influence and high-recency quadrants representing rapidly expanding frontiers. Cluster 5 and Cluster 1 demonstrates strong potential for further growth. Temporal slicing confirmed a trajectory of methodological consolidation, thematic diversification to renewed convergence focus on problem-solving. Validation showed strong semantic coherence and robustness of either methods and findings.
Conclusions:
This study delivers both substantive and methodological contributions. Substantively, this study not only maps the trajectory of Social Media Mining in healthcare but also provides strategic guidance for future research and practice. With involvement of multi-database and more robust methodologies, Social Media Mining is poised to play an increasingly important role in improving health equity and healthcare delivery. Methodologically, this study proposed hybrid analysis with dual-validation to elevate knowledge mapping tools with strategic insights.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.