Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 25, 2021
Date Accepted: May 6, 2021
Date Submitted to PubMed: Aug 12, 2021

The final, peer-reviewed published version of this preprint can be found here:

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

Stojanov R, Popovski G, Cenikj G, Koroušić Seljak B, Eftimov T

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

J Med Internet Res 2021;23(8):e28229

DOI: 10.2196/28229

PMID: 34383671

PMCID: 8415558

FoodNER: a fine-tuned BERT for food Named-Entity Recognition

  • Riste Stojanov; 
  • Gorjan Popovski; 
  • Gjorgjina Cenikj; 
  • Barbara Koroušić Seljak; 
  • Tome Eftimov

ABSTRACT

Background:

Nowadays, a lot of information about food is rapidly coming in with the publishing of new scientific papers. Hence, there exist many open research questions that involve food interactions investigation, as one of the main environmental factors, with other health-related entities such as diseases, treatments, drugs, etc. In the last two decades, a large amount of work has been done in Natural Language Processing (NLP) and Machine Learning (ML) to enable biomedical Information Extraction. Conversely, the food domain remains low-resourced, which brings to attention the problem of developing methods for food Information Extraction. There are few food semantic resources and few rule-based methods for food Information Extraction, which often depend on some external resources. However, an annotated corpus with food entities was published in 2019, along with their normalization, by using several food semantic resources.

Objective:

In this paper, we investigate how the recently published Bidirectional Encoder Representations from Transformers (BERT) model, which provides state-of-the-art results in Information Extraction, can be fine-tuned for food Information Extraction.

Methods:

We introduce FoodNER, which is a corpus-based food Named-Entity Recognition method. It consists of 15 different models obtained by fine-tuning the three pre-trained BERT models on five groups of semantic resources. The models are trained for the following predictive scenarios: food entity distinction, two subsets of Hansard food semantic tags, FoodOn semantic tags, and SNOMED CT food semantic tags.

Results:

All BERT models provide very promising results obtaining a 93.30-94.31% macro F1 score in the food distinction scenario, which represents the new state-of-the-art in food Information Extraction. Considering the scenarios where semantic tags are predicted, all BERT models obtain very promising results once again, their macro F1 score ranging from 73.39% to 78.96%.

Conclusions:

FoodNER can be used to extract and annotate food entities in five different scenarios: distinguishing between food and non-food entities, and distinguishing food entities on the level of food groups by using i.) the closest Hansard semantic tags, ii.) the parent Hansard semantic tags, iii.) the FoodOn semantic tags, iv.) or the SNOMED-CT semantic tags.


 Citation

Please cite as:

Stojanov R, Popovski G, Cenikj G, Koroušić Seljak B, Eftimov T

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

J Med Internet Res 2021;23(8):e28229

DOI: 10.2196/28229

PMID: 34383671

PMCID: 8415558

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.