JMIR Preprints #48115: Large-scale biomedical relation extraction across diverse types: model development, and usability study on COVID-19

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large-scale biomedical relation extraction across diverse types: model development, and usability study on COVID-19

Zeyu Zhang;
Meng Fang;
Rebecca Wu;
Hui Zong;
Honglian Huang;
Yuantao Tong;
Yujia Xie;
Shiyang Cheng;
Ziyi Wei;
M. James C. Crabbe;
Xiaoyan Zhang;
Ying Wang

ABSTRACT

Background:

The relations between biomedical entities are complex and diverse. Biomedical relation extraction (RE) can provide support for downstream tasks including the automatic construction of the knowledge graph (KG), to meet the application needs of knowledge discovery in the biomedical field.

Objective:

However, there is still a lack of investigation for model exploration and scenario application on large-scale data with complex relation categories, which is practical for research hot button topics with enormous amounts of literature like COVID-19. This paper aims to streamline and improve literature analysis by large-scale RE to optimize knowledge mining.

Methods:

Datasets containing entity semantic data at different levels are constructed based on a large-scale RE dataset and UMLS to evaluate the effect of entity information on RE. We then conducted performance analysis on different model architectures and domain models, and we also proposed continued pre-training strategies and ensemble modeling to obtain the best RE performance to provide functional RE tools. We also applied RE to the COVID-19 corpus with several cases to assess the applicability of our approach.

Results:

The performance analysis revealed that RE achieves the best performance if the detailed semantic type is provided. For a single model, PubMedBERT with our continued pre-training strategy performed the best with an F1 score of 0.8998, while the ensemble model outperformed all single models with an average F1 score of 0.9002. The COVID-19 use cases demonstrated the biological significance of RE, with our model constructing a KG that revealed several novel drug paths. This study also retrieved drug sets from non-long/long COVID separately and constructed relational triples between coronavirus-specific entities based on the RE.

Conclusions:

The optimized RE models for diverse relation types are developed based on performance analysis. Our RE application provided a proof-of-concept demonstration of how large-scale literature mining can be leveraged to facilitate novel scientific research.

Citation

Please cite as:

Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19

J Med Internet Res 2023;25:e48115

DOI: 10.2196/48115

PMID: 37632414

PMCID: 10551783

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 11, 2023

Date Accepted: Aug 25, 2023

Date Submitted to PubMed: Aug 26, 2023

Large-scale biomedical relation extraction across diverse types: model development, and usability study on COVID-19

ABSTRACT

Citation

Copyright