Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 3, 2020
Date Accepted: Jan 16, 2021

The final, peer-reviewed published version of this preprint can be found here:

Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

Rosado E, Garcia-Remesal M Sr, Paraiso-Medina S Sr, Pazos A Sr, Maojo V Sr

Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

JMIR Med Inform 2021;9(2):e22976

DOI: 10.2196/22976

PMID: 33629960

PMCID: 7952234

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

BiDI: Using Machine Learning to collect and facilitate remote access to biomedical databases

  • Eduardo Rosado; 
  • Miguel Garcia-Remesal Sr; 
  • Sergio Paraiso-Medina Sr; 
  • Alejandro Pazos Sr; 
  • Victor Maojo Sr

ABSTRACT

Background:

Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases.

Objective:

To address this issue we developed BiDI (Biomedical Database Inventory), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them in a seamless manner.

Methods:

We designed an ensemble of Deep Learning methods to extract database mentions. To train the system we annotated a set of 1,242 articles that included mentions to database publications. Such a dataset was used along with transfer learning techniques to train an ensemble of deep learning NLP models based on the task of database publication detection.

Results:

The system obtained an f1-score of 0.929 on database detection, showing high precision and recall values. Applying this model to the PubMed and PubMed Central databases we identified over 10,000 unique databases. The ensemble also extracts the web links to the reported databases, discarding the irrelevant links. For the extraction of web links the model achieved a cross-validated f1-score of 0.908. We show two use cases, related to “omics” and the COVID-19 pandemia.

Conclusions:

BiDI enables the access of biomedical resources over the Internet and facilitates data-driven research and other scientific initiatives. The repository is available at (http://gib.fi.upm.es/bidi/) and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (biomedical and others).


 Citation

Please cite as:

Rosado E, Garcia-Remesal M Sr, Paraiso-Medina S Sr, Pazos A Sr, Maojo V Sr

Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

JMIR Med Inform 2021;9(2):e22976

DOI: 10.2196/22976

PMID: 33629960

PMCID: 7952234

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.