Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Oct 6, 2021
Date Accepted: Feb 11, 2022
Date Submitted to PubMed: Aug 30, 2022

The final, peer-reviewed published version of this preprint can be found here:

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study

Pal R, Chopra H, Awasthi R, Bandhey H, Nagori A, Sethi T

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study

J Med Internet Res 2022;24(11):e34067

DOI: 10.2196/34067

PMID: 36040993

PMCID: 9629347

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature with Unsupervised Word Embeddings and Machine Learning

  • Ridam Pal; 
  • Harshita Chopra; 
  • Raghav Awasthi; 
  • Harsh Bandhey; 
  • Aditya Nagori; 
  • Tavpritesh Sethi

ABSTRACT

Background:

Evidence from peer-reviewed literature is the cornerstone for designing responses to global threats such as COVID-19. The collection of knowledge in publications needs to be distilled into evidence by leveraging natural language models and machine learning.

Objective:

We aim to show that new knowledge can be captured and tracked using the temporal change in the underlying unsupervised word embeddings of literature. Further imminent themes can be predicted using machine learning upon the evolving associations between words.

Methods:

Frequently occurring medical entities were extracted from the abstracts of more than 150,000 COVID-19 articles published on the WHO database, collected on a monthly interval starting from February 2020. Word embeddings trained on each month's literature were used to construct networks of entities with cosine similarities as edge weights. Topological features of the subsequent month’s network were forecasted based on prior patterns and new links were predicted using supervised machine learning. Community detection and alluvial diagrams were used to track biomedical themes that evolved over the months.

Results:

We found that thromboembolic complications were detected as an emerging theme as early as August 2020. A shift towards symptoms of Long COVID complications was observed during March 2021 and neurological complications gained significance in June 2021. A prospective validation of the link prediction models achieved an AUROC score of 0.87. Predictive modelling revealed predisposing conditions, symptoms, cross-infection and neurological complications as a dominant research theme in COVID-19 publications based on patterns observed in previous months.

Conclusions:

Machine learning-based prediction of emerging links can contribute towards steering research by capturing themes represented by groups of medical entities, based on patterns of semantic relationships over time.


 Citation

Please cite as:

Pal R, Chopra H, Awasthi R, Bandhey H, Nagori A, Sethi T

Predicting Emerging Themes in Rapidly Expanding COVID-19 Literature With Unsupervised Word Embeddings and Machine Learning: Evidence-Based Study

J Med Internet Res 2022;24(11):e34067

DOI: 10.2196/34067

PMID: 36040993

PMCID: 9629347

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.