Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 27, 2020
Date Accepted: Nov 24, 2020
Date Submitted to PubMed: Mar 10, 2021
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
A comprehensive overview of the COVID-19 literature: A machine learning-based bibliometric analysis
ABSTRACT
Background:
Shortly after the emergence of the novel coronavirus disease (COVID-19), researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccination. This led to a rapid increase in the number of COVID-19-related publications. Identifying trends and areas of interest using traditional review methods (e.g, scoping review and systematic reviews) for such a large domain area is challenging.
Objective:
We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature.
Methods:
We used the COVID-19 Open Research Dataset (CORD-19) that consists of enormous number of articles related to all coronaviruses. We used machine learning method to analyse most relevant COVID-19 related articles and extracted most prominent topics. Specifically, we used clustering algorithm to group articles based on similarity of their abstracts to identify the research hotspots and current research directions.
Results:
Of the 196,630 publications retrieved from the database, we included 28,904 in the analysis. The mean number of weekly publications was 990 (SD=789.3). The country that published the highest number of articles was China (n=2,950). The largest number of documents was published in BioRxiv. Lei Liu affiliated in the Southern University of Science and Technology in China published the highest number of documents (n=46). Based on titles and abstracts alone, we were able to identify 1,515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, 362 randomized control trials. We identified 19 different topics addressed by the included studies. The most dominant topic was public health response followed by clinical care practices during COVID-19, its clinical characteristics and risk factors, and epidemic models for its spread.
Conclusions:
We provided an overview of the COVID-19 literature and identified current hotspots and research directions. Our findings can be helpful to the research community by helping prioritize research needs, and recognize leading COVID-19 researchers, institutes, countries, and publishers. This study showed that an AI-based bibliometric analysis has the potential to rapidly explore large corpora of academic publications during a public health crisis. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.