Accepted for/Published in: JMIR Infodemiology
Date Submitted: Aug 21, 2024
Date Accepted: Jan 25, 2025
Using Natural Language Processing methods to build the Hypersexuality in Bipolar Reddit Corpus (HiB-RC): an explorative study
ABSTRACT
Background:
Bipolar is a severe mental health condition which is thought to affect at least 2% of the global population, and clinical observations suggest that individuals living with the illness may have a high propensity for engaging with risk-taking behaviours such as hypersexuality. Hypersexuality has historically been stigmatised in society and in healthcare provision, which makes it more difficult for service users to talk about their behaviours. There is a need for greater understanding of hypersexuality to develop better, evidence-based treatments and support, and training for health professionals.
Objective:
This research presents a dataset of Reddit posts written about hypersexual experiences by people who have self-reported a diagnosis of bipolar on the site. Reddit and other social media websites provide a place for people to talk anonymously about their personal experiences. Natural Language Processing (NLP) methods enable researchers to understand hypersexuality from a lived experience perspective and identify salient topics of discussion within this domain. This exploratory analysis provides a holistic overview of hypersexuality as experienced by Redditors and signposts to future areas of research.
Methods:
A toolbox of computational linguistic methods was used to create a corpus (the Hypersexuality in Bipolar Reddit Corpus (HiB-RC)), infer demographic variables for the Redditors in the dataset, measure the key psychological domains in the corpus using Linguistic Inquiry and Word Count (LIWC), and build a topic model to identify salient language clusters within the dataset. The research also presents qualitative (paraphrased) excerpts from Reddit posts which are representative of nine identified topics, and discusses key ethical considerations when undertaking this type of analysis.
Results:
The HiB-RC is a corpus of posts from 816 Redditors totalling 2146 posts. The results demonstrate that between 2012 and 2021 there was a 91.65% average yearly increase of posts in the HiB-RC compared to 48.14% in the TABoRC, and an 86.97% average yearly increase in users compared to 27.17% in the TABoRC. These statistics suggest that there was an increase in posting activity related to hypersexuality which exceeds the increase in general Reddit use over the same time period. A number of key psychological domains were identified in the HiB-RC, including significantly more authentic language, more negative tone, more anxiety and less discussion of wellness compared to the control corpus. Finally, BERTopic was used to identify 9 key topics from the dataset: (1) Mania, hypomania and depression, (2) Sexuality, (3) Relationships, (4) Medication, (5) Mind and mood, (6) Trauma and abuse, (7) Monogamy and polygamy, (8) Diagnosis and ‘disorder’, and (9) Therapy.
Conclusions:
Hypersexuality is an important symptom that is discussed by people living with bipolar on Reddit and needs to be systematically recognised as a symptom of bipolar. This study yields significant insights into a topic that has often been stigmatised and for which there is a lack of qualitative data. Furthermore, this research demonstrates the utility of computational linguistic methods for large-scale language analysis. The study offers a high-level overview of hypersexuality in bipolar, providing empirical evidence which paves the way for a deeper understanding of hypersexuality from a lived experience perspective, as well as providing a novel framework for the collection and analysis of data related to hypersexuality.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.