Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 25, 2021
Date Accepted: May 6, 2021
Date Submitted to PubMed: Aug 12, 2021
FoodNER: a fine-tuned BERT for food Named-Entity Recognition
ABSTRACT
Background:
Nowadays, a lot of information about food is rapidly coming in with the publishing of new scientific papers. Hence, there exist many open research questions that involve food interactions investigation, as one of the main environmental factors, with other health-related entities such as diseases, treatments, drugs, etc. In the last two decades, a large amount of work has been done in Natural Language Processing (NLP) and Machine Learning (ML) to enable biomedical Information Extraction. Conversely, the food domain remains low-resourced, which brings to attention the problem of developing methods for food Information Extraction. There are few food semantic resources and few rule-based methods for food Information Extraction, which often depend on some external resources. However, an annotated corpus with food entities was published in 2019, along with their normalization, by using several food semantic resources.
Objective:
In this paper, we investigate how the recently published Bidirectional Encoder Representations from Transformers (BERT) model, which provides state-of-the-art results in Information Extraction, can be fine-tuned for food Information Extraction.
Methods:
We introduce FoodNER, which is a corpus-based food Named-Entity Recognition method. It consists of 15 different models obtained by fine-tuning the three pre-trained BERT models on five groups of semantic resources. The models are trained for the following predictive scenarios: food entity distinction, two subsets of Hansard food semantic tags, FoodOn semantic tags, and SNOMED CT food semantic tags.
Results:
All BERT models provide very promising results obtaining a 93.30-94.31% macro F1 score in the food distinction scenario, which represents the new state-of-the-art in food Information Extraction. Considering the scenarios where semantic tags are predicted, all BERT models obtain very promising results once again, their macro F1 score ranging from 73.39% to 78.96%.
Conclusions:
FoodNER can be used to extract and annotate food entities in five different scenarios: distinguishing between food and non-food entities, and distinguishing food entities on the level of food groups by using i.) the closest Hansard semantic tags, ii.) the parent Hansard semantic tags, iii.) the FoodOn semantic tags, iv.) or the SNOMED-CT semantic tags.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.