Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Nov 22, 2018
Date Accepted: May 21, 2019
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

Foufi V, Timakum T, Gaudet-Blavignac C, Lovis C, Song M

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

J Med Internet Res 2019;21(6):e12876

DOI: 10.2196/12876

PMID: 31199327

PMCID: 6595941

Health social network analytics: utilizing social media to detect the outcome of chronic diseases

  • Vasiliki Foufi; 
  • Tatsawan Timakum; 
  • Christophe Gaudet-Blavignac; 
  • Christian Lovis; 
  • Min Song

ABSTRACT

Background:

Social media constitutes a valuable resource for text mining tasks. In the healthcare domain, multiple forums and blogs have been created where people share their personal experience and seek for other people’s knowledge and advice.

Objective:

The work presented in this paper reports a study of entities related to chronic diseases and their relationships in a user-generated content on social media. The major focus of our study is on understanding the characteristics of disease entities and their relations from the user’s perspective.

Methods:

We collected a corpus of 17,624 text posts from disease-specific subreddits of the internet community Reddit.com. For entity and relation extraction from these data, we employed the PKDE4J tool, a text mining system that integrates dictionary-based entity extraction and rule-based relation extraction in a highly flexible and extensible framework.

Results:

Using PKDE4J, we extracted two types of entities and relations: biomedical entities and relationships, and subject-predicate-object entity relationships. In total, 82,138 entities and 30,341 relation pairs were extracted from the Reddit dataset.

Conclusions:

This study paves the way for making user-generated content on health-oriented social media available to scientists working on the development of patient treatments. These data may not be available in the literature or from laboratory experiments. The results reported in this paper are promising, and indicate the need for more in-depth studies on the best way to respond to users’ medical needs and concerns as expressed on social media.


 Citation

Please cite as:

Foufi V, Timakum T, Gaudet-Blavignac C, Lovis C, Song M

Mining of Textual Health Information from Reddit: Analysis of Chronic Diseases With Extracted Entities and Their Relations

J Med Internet Res 2019;21(6):e12876

DOI: 10.2196/12876

PMID: 31199327

PMCID: 6595941

Per the author's request the PDF is not available.