Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 23, 2018
Date Accepted: Mar 29, 2019

The final, peer-reviewed published version of this preprint can be found here:

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

Arbabi A, Adams DR, Fidler S, Brudno M

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

JMIR Med Inform 2019;7(2):e12596

DOI: 10.2196/12596

PMID: 31094361

PMCID: 6533869

Identifying clinical terms in free-text notes using ontology-guided machine learning

  • Aryan Arbabi; 
  • David R Adams; 
  • Sanja Fidler; 
  • Michael Brudno

ABSTRACT

Background:

Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications and its accuracy has a large impact on electronic health record analysis. The mining of such terms is complicated by the broad use of synonyms and non-standard terms in medical documents.

Objective:

Here we presented a machine learning model for concept recognition in large unstructured text which optimizes the use of ontological structures, and can identify previously unobserved synonyms for concepts in the ontology.

Methods:

We present a neural dictionary model which can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model uses a convolutional neural network and utilizes the taxonomy structure to encode an input phrase and ranks medical concepts based on the similarity in that space. It also utilizes the biomedical ontology structure to optimize the embedding of various terms, and has fewer training constrains than previous methods. We train our model on two biomedical ontologies, the Human Phenotype Ontology (HPO) and SNOMED-CT. Our code is available (open source) at https://github.com/ccmbioinfo/NeuralCR.

Results:

We tested our model trained on HPO on two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We also tested our model trained on the SNOMED-CT on 2000 MIMIC-III ICU discharge summaries. The results of our experiments show the high accuracy of our model, as well as the value of utilizing the taxonomy structure of the ontology in concept recognition.

Conclusions:

While the application of machine learning methods to identification of clinical terms in unstructured free text has been hampered by the lack of training data and difficulty identifying novel synonyms for terms in the ontology, our work utilizes machine learning approaches that allow for synonym identification, and the use of orthogonal, unlabelled biomedical corpa. Without any custom training, our model performs as well or better than state-of-the-art models custom built for specific ontologies.


 Citation

Please cite as:

Arbabi A, Adams DR, Fidler S, Brudno M

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning

JMIR Med Inform 2019;7(2):e12596

DOI: 10.2196/12596

PMID: 31094361

PMCID: 6533869

Per the author's request the PDF is not available.