Accepted for/Published in: JMIR Formative Research
Date Submitted: May 9, 2022
Open Peer Review Period: May 9, 2022 - May 17, 2022
Date Accepted: Aug 26, 2022
(closed for review but you can still tweet)
Studies of Online Cohorts for Internalizing symptoms and Language (SOCIAL) I and II: Triangulating surveys and social media data
ABSTRACT
Background:
Internalizing, externalizing, and somatoform disorders are the most common and disabling forms of psychopathology. Our understanding of these clinical problems is limited by a reliance on self-report along with research using small samples. Social media has emerged as an exciting avenue in which to collect large sample of longitudinal data from individuals to study psychopathology. Nonetheless, there are concerns regarding whether people who share their social media data for research are significantly different from people who do not.
Objective:
We report the results of two large ongoing studies in which we collect Twitter data and self-reported clinical screening scales, the Studies of Online Cohorts for Internalizing symptoms and Language (SOCIAL). We categorized individuals based on whether they were deemed to have given a valid Twitter account. We described differences in sociodemographic features, clinical symptoms, and aspects of social media use by whether or not individuals gave valid accounts.
Methods:
Participants were a nationally representative sample of Twitter-using adults (SOCIAL-I: N= 1,121) as well as a sample of college students in the Midwest (SOCIAL-II: N= 2,015), of which 61% were Twitter users. For all participants who were Twitter users, we asked for access to their Twitter handle which we analyzed with BotOMeter, an online application rating the likelihood the account belongs to a bot. We divided participants into four groups: 1) Twitter users who did not allow access to their account (“No handle”), 2) those who denied being Twitter users (“No Twitter,” only available for SOCIAL-II), 3) Twitter users who gave their handles but whose account had high BotOMeter score (“Bot-like”), and 4) Twitter users who provided their handles and had low BotOMeter scores (“Valid”).
Results:
n SOCIAL-I, most individuals were classified as valid (n=580) and few were deemed bot-like (n=190). 351 gave no handle. In SOCIAL-II, many individuals were not Twitter users (n = 760). Of the Twitter users in SOCIAL-II (n = 1, 455), most were classified as either invalid (n = 515) or valid (n = 484), with a smaller fraction deemed bot-like (n = 229). Participants reported high rates of mental health diagnoses as well as high levels of symptoms, especially in SOCIAL-II. In general, differences between individuals who provided or did not provide their social media handle were small and not statistically significant
Conclusions:
Triangulating passively-acquired social media data and self-reported questionnaires offers avenues for large-scale assessment and evaluation of vulnerability to mental disorders. The propensity of participants to share social media handles is not likely a source of sample bias in subsequent social media analytics
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.