Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: Aug 21, 2024
Date Accepted: Jan 25, 2025

The final, peer-reviewed published version of this preprint can be found here:

Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit

Harvey D, Rayson P, Lobban F, Palmier-Claus J, Dolman C, Chataigné A, Jones S

Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit

JMIR Infodemiology 2025;5:e65632

DOI: 10.2196/65632

PMID: 40053804

PMCID: 11926447

Using Natural Language Processing methods to build the Hypersexuality in Bipolar Reddit Corpus (HiB-RC): an explorative study

  • Daisy Harvey; 
  • Paul Rayson; 
  • Fiona Lobban; 
  • Jasper Palmier-Claus; 
  • Clare Dolman; 
  • Anne Chataigné; 
  • Steven Jones

ABSTRACT

Background:

Bipolar is a severe mental health condition which is thought to affect at least 2% of the global population, and clinical observations suggest that individuals living with the illness may have a high propensity for engaging with risk-taking behaviours such as hypersexuality. Hypersexuality has historically been stigmatised in society and in healthcare provision, which makes it more difficult for service users to talk about their behaviours. There is a need for greater understanding of hypersexuality to develop better, evidence-based treatments and support, and training for health professionals.

Objective:

This research presents a dataset of Reddit posts written about hypersexual experiences by people who have self-reported a diagnosis of bipolar on the site. Reddit and other social media websites provide a place for people to talk anonymously about their personal experiences. Natural Language Processing (NLP) methods enable researchers to understand hypersexuality from a lived experience perspective and identify salient topics of discussion within this domain. This exploratory analysis provides a holistic overview of hypersexuality as experienced by Redditors and signposts to future areas of research.

Methods:

A toolbox of computational linguistic methods was used to create a corpus (the Hypersexuality in Bipolar Reddit Corpus (HiB-RC)), infer demographic variables for the Redditors in the dataset, measure the key psychological domains in the corpus using Linguistic Inquiry and Word Count (LIWC), and build a topic model to identify salient language clusters within the dataset. The research also presents qualitative (paraphrased) excerpts from Reddit posts which are representative of nine identified topics, and discusses key ethical considerations when undertaking this type of analysis.

Results:

The HiB-RC is a corpus of posts from 816 Redditors totalling 2146 posts. The results demonstrate that between 2012 and 2021 there was a 91.65% average yearly increase of posts in the HiB-RC compared to 48.14% in the TABoRC, and an 86.97% average yearly increase in users compared to 27.17% in the TABoRC. These statistics suggest that there was an increase in posting activity related to hypersexuality which exceeds the increase in general Reddit use over the same time period. A number of key psychological domains were identified in the HiB-RC, including significantly more authentic language, more negative tone, more anxiety and less discussion of wellness compared to the control corpus. Finally, BERTopic was used to identify 9 key topics from the dataset: (1) Mania, hypomania and depression, (2) Sexuality, (3) Relationships, (4) Medication, (5) Mind and mood, (6) Trauma and abuse, (7) Monogamy and polygamy, (8) Diagnosis and ‘disorder’, and (9) Therapy.

Conclusions:

Hypersexuality is an important symptom that is discussed by people living with bipolar on Reddit and needs to be systematically recognised as a symptom of bipolar. This study yields significant insights into a topic that has often been stigmatised and for which there is a lack of qualitative data. Furthermore, this research demonstrates the utility of computational linguistic methods for large-scale language analysis. The study offers a high-level overview of hypersexuality in bipolar, providing empirical evidence which paves the way for a deeper understanding of hypersexuality from a lived experience perspective, as well as providing a novel framework for the collection and analysis of data related to hypersexuality.


 Citation

Please cite as:

Harvey D, Rayson P, Lobban F, Palmier-Claus J, Dolman C, Chataigné A, Jones S

Using Natural Language Processing Methods to Build the Hypersexuality in Bipolar Reddit Corpus: Infodemiology Study of Reddit

JMIR Infodemiology 2025;5:e65632

DOI: 10.2196/65632

PMID: 40053804

PMCID: 11926447

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.