Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 9, 2022
Date Accepted: Jul 22, 2022

The final, peer-reviewed published version of this preprint can be found here:

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube

Tong C, Margolin D, Chunara R, Niederdeppe J, Taylor T, Dunbar N, King AJ

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube

JMIR Med Inform 2022;10(8):e37862

DOI: 10.2196/37862

PMID: 36040760

PMCID: 9472050

Search Term Identification Methods for Computational Health Communication: A Word Embedding and Network Approach for Health Content on YouTube

  • Chau Tong; 
  • Drew Margolin; 
  • Rumi Chunara; 
  • Jeff Niederdeppe; 
  • Teairah Taylor; 
  • Natalie Dunbar; 
  • Andy J King

ABSTRACT

Background:

Some health communication content on social media platforms is likely to use colloquial language. Common methods for social media data retrieval in health communication contexts typically involve only technical language and medical vocabulary that may be unfamiliar to the public. Methods that leverage colloquial language have been use-case specific and there is no general process for specifically expanding standard terminology to colloquial terms.

Objective:

Motivated by this challenge, we put forward a search term identification method to improve health communication social media content retrieval, using cancer screening as a subject, and YouTube as a platform case study.

Methods:

We developed an approach that leveraged word embeddings trained on topic-specific text data to identify terms that are semantically similar to formal medical concepts. Computational textual analysis and network analysis were used to examine the newly identified videos for content novelty and connections with videos from the original concepts.

Results:

Terms with semantic similarities to cancer screening tests were identified via word2vec. These neighbor terms retrieved novel and contextually diverse content beyond the original content from the medical concepts, improving recall. Precision is maintained by calculating the network degrees of videos, which correlated with human judgment of whether the newly identified videos contained relevant content.

Conclusions:

We discussed the benefits of the technique regarding human coding resources and outlined suggestions to improve health-related content retrieval across social media platforms.


 Citation

Please cite as:

Tong C, Margolin D, Chunara R, Niederdeppe J, Taylor T, Dunbar N, King AJ

Search Term Identification Methods for Computational Health Communication: Word Embedding and Network Approach for Health Content on YouTube

JMIR Med Inform 2022;10(8):e37862

DOI: 10.2196/37862

PMID: 36040760

PMCID: 9472050

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.