Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Nov 6, 2019
Date Accepted: Feb 27, 2020

The final, peer-reviewed published version of this preprint can be found here:

Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases

Arguello Casteleiro M, Des Diz J, Maroto N, Fernandez Prieto MJ, Peters S, Wroe C, Sevillano Torrado C, Maseda Fernandez D, Stevens R

Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases

JMIR Med Inform 2020;8(8):e16948

DOI: 10.2196/16948

PMID: 32759099

PMCID: 7441383

Semantic Deep Learning: prior knowledge and a type of four-term embedding analogies to acquire treatments for well-known diseases

  • Mercedes Arguello Casteleiro; 
  • Julio Des Diz; 
  • Nava Maroto; 
  • Maria Jesus Fernandez Prieto; 
  • Simon Peters; 
  • Chris Wroe; 
  • Carlos Sevillano Torrado; 
  • Diego Maseda Fernandez; 
  • Robert Stevens

ABSTRACT

Background:

How to treat a disease remains the commonest type of clinical question. An answer available to both machines and humans from evidence-based biomedical literature is difficult. Embedding analogies may extract such biomedical facts, although the state-of-the-art focuses on pair-based proportional analogies (e.g. man:woman::king:queen).

Objective:

To extract human-readable and machine-processable disease-treatment statements, we develop a Semantic Deep Learning (SemDeep) approach to systematically acquire a type of four-term analogy exploiting commonalities in structural features.

Methods:

As preliminaries, we investigate CBOW embedding analogies in a common-English corpus with 5 lines of text, and observe a type of four-term analogy applying the 3CosAdd formula (utilised with pair-based proportional analogies) relating the semantic fields person and death: “dagger = die - Romeo + died” (search query: -Romeo +die +died). Our SemDeep approach works with pre-existing items of knowledge (what is known) to make inferences sanctioned by a four-term analogy (search query -x +z1 +z2) from embeddings created with PubMed/MEDLINE free-text. Stage1: Knowledge acquisition (acquisition of domain-specific terms). Obtaining a set of terms, the candidate y, from CBOW and Skip-gram embeddings using vector arithmetic. Some n-gram pairs from the cosine (prior knowledge) are the input for the 3cosAdd seeking a type of four-term analogy relating the semantic fields disease and treatment. Stage 2: Knowledge organisation (explicit conceptualisation of the meaning of terms). Identification of candidates sanctioned by the analogy that belong to the treatment field, next they are mapped to UMLS Metathesaurus concepts with MetaMap. A concept pair is a brief disease-treatment statement (biomedical fact). Stage 3: Knowledge validation (validating statements). The validation of the machine-processable biomedical facts potentially useful for clinicians is based on evidence.

Results:

We perform a validation with 5352 n-gram pairs from the biomedical literature. The micro-averaging performance of MetaMap for those n-grams belonging to the semantic field treatment is F measure=80.00% (precision=77.00%, recall=83.25%). We develop a heuristic by visual inspection with some predictive power for the clinical winners, i.e. search queries -x +z1 +z2 bringing candidates y with evidence of a therapeutic intent for target disease x, like the search query -asthma +inhaled_corticosteroids +inhaled_corticosteroid that finds eight evidence-based beneficial treatments for asthma.

Conclusions:

Extracting treatments with therapeutic intent by analogical reasoning (without explicit representations of relations) from embeddings (a pool of 423K n-grams from PubMed/MEDLINE data) is an ambitious goal. Our SemDeep approach is knowledge-based, exploits analogical reasoning with prior knowledge and embeddings, and systematically acquires evidence-based statements about beneficial treatments for well-known diseases. The biomedical facts discover by analogical reasoning are potentially useful for clinicians and are machine-processable as well as human-readable. Learning from deep learning models does not need a massive amount of data. Embedding analogies are not limited to the pair-based proportional, hence, analogical reasoning with embeddings is underexploited.


 Citation

Please cite as:

Arguello Casteleiro M, Des Diz J, Maroto N, Fernandez Prieto MJ, Peters S, Wroe C, Sevillano Torrado C, Maseda Fernandez D, Stevens R

Semantic Deep Learning: Prior Knowledge and a Type of Four-Term Embedding Analogy to Acquire Treatments for Well-Known Diseases

JMIR Med Inform 2020;8(8):e16948

DOI: 10.2196/16948

PMID: 32759099

PMCID: 7441383

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.