Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 11, 2023
Date Accepted: Aug 25, 2023
Date Submitted to PubMed: Aug 26, 2023

The final, peer-reviewed published version of this preprint can be found here:

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19

Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19

J Med Internet Res 2023;25:e48115

DOI: 10.2196/48115

PMID: 37632414

PMCID: 10551783

Large-scale biomedical relation extraction across diverse types: model development, and usability study on COVID-19

  • Zeyu Zhang; 
  • Meng Fang; 
  • Rebecca Wu; 
  • Hui Zong; 
  • Honglian Huang; 
  • Yuantao Tong; 
  • Yujia Xie; 
  • Shiyang Cheng; 
  • Ziyi Wei; 
  • M. James C. Crabbe; 
  • Xiaoyan Zhang; 
  • Ying Wang

ABSTRACT

Background:

The relations between biomedical entities are complex and diverse. Biomedical relation extraction (RE) can provide support for downstream tasks including the automatic construction of the knowledge graph (KG), to meet the application needs of knowledge discovery in the biomedical field.

Objective:

However, there is still a lack of investigation for model exploration and scenario application on large-scale data with complex relation categories, which is practical for research hot button topics with enormous amounts of literature like COVID-19. This paper aims to streamline and improve literature analysis by large-scale RE to optimize knowledge mining.

Methods:

Datasets containing entity semantic data at different levels are constructed based on a large-scale RE dataset and UMLS to evaluate the effect of entity information on RE. We then conducted performance analysis on different model architectures and domain models, and we also proposed continued pre-training strategies and ensemble modeling to obtain the best RE performance to provide functional RE tools. We also applied RE to the COVID-19 corpus with several cases to assess the applicability of our approach.

Results:

The performance analysis revealed that RE achieves the best performance if the detailed semantic type is provided. For a single model, PubMedBERT with our continued pre-training strategy performed the best with an F1 score of 0.8998, while the ensemble model outperformed all single models with an average F1 score of 0.9002. The COVID-19 use cases demonstrated the biological significance of RE, with our model constructing a KG that revealed several novel drug paths. This study also retrieved drug sets from non-long/long COVID separately and constructed relational triples between coronavirus-specific entities based on the RE.

Conclusions:

The optimized RE models for diverse relation types are developed based on performance analysis. Our RE application provided a proof-of-concept demonstration of how large-scale literature mining can be leveraged to facilitate novel scientific research.


 Citation

Please cite as:

Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19

J Med Internet Res 2023;25:e48115

DOI: 10.2196/48115

PMID: 37632414

PMCID: 10551783

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.