Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Mar 17, 2020
Date Accepted: Nov 11, 2020
Ontology-based analysis of social media data to understand consumers' information needs and emotions regarding cancer
ABSTRACT
Background:
Posts on social media are very useful for identifying health information needs in the management of disease and emotional status related to disease. An ontology is needed for semantic analysis of social media data.
Objective:
This study was performed to develop a cancer ontology with consumer terms and to analyze social media data to identify health information needs and emotions related to cancer.
Methods:
We developed a cancer ontology based on Noy and McGuinness’s Ontology development 101. The social media data on cancer collected using a crawler from online communities and blogs between January 1, 2014 and June 30, 2017 in South Korea. The relative frequency of post containing ontology concepts were counted and compared by cancer type.
Results:
The developed ontology has nine superclasses, 213 class concepts, and 4,061 synonyms. Ontology-driven natural language processing (NLP) was performed on the text in 754,744 cancer-related posts collected from blogs and online communities in Korea. Colon, breast, stomach, cervix, lung, liver, leukemia, brain, pancreas, and prostate cancer were appeared most commonly in these posts. At the superclass level, risk factor was the most frequently posted, followed by emotions, symptoms, treatments, and dealing with cancer.
Conclusions:
Information needs and emotions differed according to the cancer type. The observations of the present study could be used to provide tailor information to consumers according to the cancer type and care process of cancer. Attention should be paid to cancer information not only for patients, but also for their families and the public who are interested in cancer.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.