JMIR Preprints #28229: FoodNER: a fine-tuned BERT for food Named-Entity Recognition

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

FoodNER: a fine-tuned BERT for food Named-Entity Recognition

Riste Stojanov;
Gorjan Popovski;
Gjorgjina Cenikj;
Barbara Koroušić Seljak;
Tome Eftimov

ABSTRACT

Background:

Nowadays, a lot of information about food is rapidly coming in with the publishing of new scientific papers. Hence, there exist many open research questions that involve food interactions investigation, as one of the main environmental factors, with other health-related entities such as diseases, treatments, drugs, etc. In the last two decades, a large amount of work has been done in Natural Language Processing (NLP) and Machine Learning (ML) to enable biomedical Information Extraction. Conversely, the food domain remains low-resourced, which brings to attention the problem of developing methods for food Information Extraction. There are few food semantic resources and few rule-based methods for food Information Extraction, which often depend on some external resources. However, an annotated corpus with food entities was published in 2019, along with their normalization, by using several food semantic resources.

Objective:

In this paper, we investigate how the recently published Bidirectional Encoder Representations from Transformers (BERT) model, which provides state-of-the-art results in Information Extraction, can be fine-tuned for food Information Extraction.

Methods:

We introduce FoodNER, which is a corpus-based food Named-Entity Recognition method. It consists of 15 different models obtained by fine-tuning the three pre-trained BERT models on five groups of semantic resources. The models are trained for the following predictive scenarios: food entity distinction, two subsets of Hansard food semantic tags, FoodOn semantic tags, and SNOMED CT food semantic tags.

Results:

All BERT models provide very promising results obtaining a 93.30-94.31% macro F1 score in the food distinction scenario, which represents the new state-of-the-art in food Information Extraction. Considering the scenarios where semantic tags are predicted, all BERT models obtain very promising results once again, their macro F1 score ranging from 73.39% to 78.96%.

Conclusions:

FoodNER can be used to extract and annotate food entities in five different scenarios: distinguishing between food and non-food entities, and distinguishing food entities on the level of food groups by using i.) the closest Hansard semantic tags, ii.) the parent Hansard semantic tags, iii.) the FoodOn semantic tags, iv.) or the SNOMED-CT semantic tags.

Citation

Please cite as:

Stojanov R, Popovski G, Cenikj G, Koroušić Seljak B, Eftimov T

A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation

J Med Internet Res 2021;23(8):e28229

DOI: 10.2196/28229

PMID: 34383671

PMCID: 8415558

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Feb 25, 2021

Date Accepted: May 6, 2021

Date Submitted to PubMed: Aug 12, 2021

FoodNER: a fine-tuned BERT for food Named-Entity Recognition

ABSTRACT

Citation

Copyright