Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 28, 2019
Open Peer Review Period: Oct 28, 2019 - Nov 19, 2019
Date Accepted: Dec 15, 2019
(closed for review but you can still tweet)
Systematic evaluation of research progress on natural language processing in medicine over the past 20 years: a bibliometric study
ABSTRACT
Background:
Natural language processing (NLP) is an important traditional field of computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally, and the increasing importance of the understanding and mining of big data in the medical field, NLP is becoming more crucial.
Objective:
To perform a systematic review on the use of NLP in medical research, with the aim to understand the global progress on NLP research outcomes, content, and methods, as well as the study groups involved.
Methods:
A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Software such as Excel and VOSviewer were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, and collaboration relationships, as well as research hotspots, diseases studied, and research methods.
Results:
A total of 3,498 articles were obtained during initial screening, and 2,336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The USA contributed to 63.0% (1,472/2,336) of all publications, followed by France (5.4%, 127/2,336) and the United Kingdom (3.5%, 82/2,336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author’s affiliation institutions, Columbia University published the largest number of articles, accounting for 4.5% (106/2,336) of the total. Regarding the departments to which the authors belonged, the largest number of articles was published by the department of biomedical informatics (14.3%, 334/2,336). Among journals in which the articles were published, Studies in Health Technology and Informatics contained the largest number of articles (17.5%, 408/2,336). Specifically, approximately one-fifth (17.7%, 413/2,336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.0%, 68/413), breast cancer (5.8%, 24/413), and pneumonia (4.1%, 17/413).
Conclusions:
NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical record was the most used research material, but social media, such as Twitter, have become important research materials since 2015. Cancer (24.9%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.3%, 24/103) and lung cancers (14.6%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.