Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 20, 2025
Open Peer Review Period: Feb 20, 2025 - Apr 17, 2025
Date Accepted: Mar 24, 2025
(closed for review but you can still tweet)
Decoding Digital Discourse: Multimodal Text and Image Machine Learning Models to Classify Sentiment, Hate, and Anti-Hate of Race and LGBTQIA+ Related Posts on Social Media
ABSTRACT
Background:
A major challenge in sentiment analysis on social media is the increasing prevalence of image-based content, particularly memes, which integrate text and visuals to convey nuanced messages. Traditional text-based approaches have been widely used to assess public attitudes and beliefs; however, they often fail to fully capture the meaning of multimodal content, where cultural, contextual, and visual elements play a significant role.
Objective:
This study aims to provide practical guidance for collecting, processing and analyzing social media data using multimodal machine learning models. Specifically, it focuses on training and fine-tuning models to classify sentiment (positive or negative) and detect hate speech (hateful or anti-hateful content).
Methods:
Social media data was collected from Facebook and Instagram using CrowdTangle, a now-discontinued public insights tool from Meta, and from X (formerly Twitter) via its Academic API. The dataset was filtered to include only race-related terms and LGBTQIA+- posts with image attachments ensuring focus on multimodal content. Human annotators labeled 13,000 posts into four categories: negative sentiment, positive sentiment, hate, or anti-hate. We evaluated unimodal models (BERT for text, VGG-16 for images) and multimodal models (CLIP, VisualBERT, and an intermediate fusion model). To enhance model performance, Synthetic Minority Oversampling Technique (SMOTE) was applied to address class imbalances, and Latent Dirichlet Allocation (LDA) was used to improve semantic representations.
Results:
Our findings highlight key differences in model performance. Among unimodal models, BERT outperformed VGG-16, achieving higher accuracy and macro F1 scores across all tasks. Among multimodal models, CLIP achieved the highest accuracy (0.86) in negative sentiment detection, followed by VisualBERT (0.84). For positive sentiment, VisualBERT outperformed other models with the highest accuracy (0.76). In hate speech detection, the intermediate fusion model demonstrated the highest accuracy (0.91) with F1 of 0.64, ensuring balanced performance. Meanwhile, VisualBERT performed best in anti-hate classification, achieving an accuracy of 0.78. Applying LDA and SMOTE improved minority class detection, particularly for anti-hate content. Overall, the intermediate fusion model provided the most balanced performance across tasks, while CLIP excelled in accuracy-driven classifications. Although VisualBERT performed well in certain areas, it struggled to maintain a precision-recall balance. These results emphasize the effectiveness of multimodal approaches over unimodal models in analyzing social media sentiment.
Conclusions:
This study contributes to the growing research on multimodal machine learning by demonstrating how advanced models, data augmentation techniques, and diverse datasets can enhance the analysis of social media content. The findings offer valuable insights for researchers, policymakers, and public health professionals seeking to leverage AI for social media monitoring and addressing broader societal challenges.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.