Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Aug 27, 2020
Date Accepted: Nov 24, 2020
Date Submitted to PubMed: Mar 10, 2021

The final, peer-reviewed published version of this preprint can be found here:

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

Abd-Alrazaq A, Schneider J, Mifsud B, Alam T, Househ M, Hamdi M, Shah Z

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

J Med Internet Res 2021;23(3):e23703

DOI: 10.2196/23703

PMID: 33600346

PMCID: 7942394

A comprehensive overview of the COVID-19 literature: A machine learning-based bibliometric analysis

  • Alaa Abd-Alrazaq; 
  • Jens Schneider; 
  • Borbala Mifsud; 
  • Tanvir Alam; 
  • Mowafa Househ; 
  • Mounir Hamdi; 
  • Zubair Shah

ABSTRACT

Background:

Shortly after the emergence of the novel coronavirus disease (COVID-19), researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccination. This led to a rapid increase in the number of COVID-19-related publications. Identifying trends and areas of interest using traditional review methods (e.g., scoping review and systematic reviews) for such a large domain area is challenging.

Objective:

We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature.

Methods:

We used the COVID-19 Open Research Dataset (CORD-19) that consists of large number of articles related to all coronaviruses. We used machine learning method to analyze most relevant COVID-19 related articles and extracted most prominent topics. Specifically, we used clustering algorithm to group articles based on similarity of their abstracts to identify the research hotspots and current research directions. We have made our software accessible to the community via GitHub.

Results:

Of the 196,630 publications retrieved from the database, we included 28,904 in the analysis. The mean number of weekly publications was 990 (SD=789.3). The country that published the highest number of articles was China (n=2,950). The largest number of documents was published in BioRxiv. Lei Liu affiliated in the Southern University of Science and Technology in China published the highest number of documents (n=46). Based on titles and abstracts alone, we were able to identify 1,515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, 362 randomized control trials. We identified 19 different topics addressed by the included studies. The most dominant topic was public health response followed by clinical care practices during COVID-19, its clinical characteristics and risk factors, and epidemic models for its spread.

Conclusions:

We provided an overview of the COVID-19 literature and identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs, and recognize leading COVID-19 researchers, institutes, countries, and publishers. This study showed that an AI-based bibliometric analysis has the potential to rapidly explore large corpora of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth related literature to help clinicians, administrators and policy makers to have a holistic view of the literature and be able to categorize the different topics of existing research for further analysis. It can be further scaled, for instance in time, to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.


 Citation

Please cite as:

Abd-Alrazaq A, Schneider J, Mifsud B, Alam T, Househ M, Hamdi M, Shah Z

A Comprehensive Overview of the COVID-19 Literature: Machine Learning–Based Bibliometric Analysis

J Med Internet Res 2021;23(3):e23703

DOI: 10.2196/23703

PMID: 33600346

PMCID: 7942394

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.