Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 27, 2020
Date Accepted: Nov 24, 2020
Date Submitted to PubMed: Mar 10, 2021
A comprehensive overview of the COVID-19 literature: A machine learning-based bibliometric analysis
ABSTRACT
Background:
Shortly after the emergence of the novel coronavirus disease (COVID-19), researchers rapidly mobilized to study numerous aspects of the disease such as its evolution, clinical manifestations, effects, treatments, and vaccination. This led to a rapid increase in the number of COVID-19-related publications. Identifying trends and areas of interest using traditional review methods (e.g., scoping review and systematic reviews) for such a large domain area is challenging.
Objective:
We aimed to conduct an extensive bibliometric analysis to provide a comprehensive overview of the COVID-19 literature.
Methods:
We used the COVID-19 Open Research Dataset (CORD-19) that consists of large number of articles related to all coronaviruses. We used machine learning method to analyze most relevant COVID-19 related articles and extracted most prominent topics. Specifically, we used clustering algorithm to group articles based on similarity of their abstracts to identify the research hotspots and current research directions. We have made our software accessible to the community via GitHub.
Results:
Of the 196,630 publications retrieved from the database, we included 28,904 in the analysis. The mean number of weekly publications was 990 (SD=789.3). The country that published the highest number of articles was China (n=2,950). The largest number of documents was published in BioRxiv. Lei Liu affiliated in the Southern University of Science and Technology in China published the highest number of documents (n=46). Based on titles and abstracts alone, we were able to identify 1,515 surveys, 733 systematic reviews, 512 cohort studies, 480 meta-analyses, 362 randomized control trials. We identified 19 different topics addressed by the included studies. The most dominant topic was public health response followed by clinical care practices during COVID-19, its clinical characteristics and risk factors, and epidemic models for its spread.
Conclusions:
We provided an overview of the COVID-19 literature and identified current hotspots and research directions. Our findings can be useful for the research community to help prioritize research needs, and recognize leading COVID-19 researchers, institutes, countries, and publishers. This study showed that an AI-based bibliometric analysis has the potential to rapidly explore large corpora of academic publications during a public health crisis. We believe that this work can be used to analyze other eHealth related literature to help clinicians, administrators and policy makers to have a holistic view of the literature and be able to categorize the different topics of existing research for further analysis. It can be further scaled, for instance in time, to clinical summary documentation. Publishers should avoid noise in the data by developing a way to trace the evolution of individual publications and unique authors.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.