JMIR Preprints #47434: A Deep Learning Model for the Normalization of Institution Names by Multi-Source Literature Feature Fusion

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Deep Learning Model for the Normalization of Institution Names by Multi-Source Literature Feature Fusion

Yifei Chen;
Xiaoying Li;
Aihua Li;
Yongjie Li;
Xuemei Yang;
Ziluo Lin;
Shirui Yu;
Xiaoli Tang

ABSTRACT

Background:

The normalization of institution names is of great importance for literature retrieval, statistics of academic achievements, and evaluation of competitiveness of research institutions. Differences in authors' writing habits and spelling mistakes lead to variant names of institutions, which affects the analysis of publication data. With the development of deep learning models and the increasing maturity of natural language processing methods, training a deep learning-based institution name normalization model can heighten the accuracy of institution name normalization from the semantic level.

Objective:

This study aimed to train a deep learning-based model for institution name normalization based on the multi-source literature feature fusion of institutional address data, which can realize the normalization of institution name variants with the help of authority files, and achieve a high specification accuracy after several rounds of training and optimization.

Methods:

In this study, an institution name normalization model was trained based on Bidirectional Encoder Representation from Transformers (BERT) and other deep learning models, mainly including the classification model, hierarchical relation extraction model, matching and merging model of institutions. Then the model was trained to automatically learn institutional features by pre-training and fine-tuning, and institution names were extracted from affiliation data of 3 databases: Dimensions, Web of Science, and Scopus to complete the normalization process.

Results:

It was found that the trained model could achieve at least three functions as follows: Firstly, the model could identify the institution name that is consistent with the authority files and associate the name with the files through the unique institution ID; Secondly, it could identify the non-standard institution name variants, such as singular, plural changes, abbreviations and update the authority files; Thirdly, it could identify the unregistered institutions and add them to the authority files, so that when the institution appeared again, the model could identify and treat it as a registered institution. Moreover, testing results showed that the accuracy of the normalization model reached 93.79%, indicating the promising performance of the model in the normalization of institution names.

Conclusions:

The deep learning based institution name normalization model trained in this study exhibits high accuracy. Therefore, it could be widely applied in the evaluation of competitiveness of research institutions, analysis of research fields of institutions, and construction of inter-institutional cooperation networks, etc., showing high application value.

Citation

Please cite as:

Chen Y, Li X, Li A, Li Y, Yang X, Lin Z, Yu S, Tang X

A Deep Learning Model for the Normalization of Institution Names by Multisource Literature Feature Fusion: Algorithm Development Study

JMIR Form Res 2023;7:e47434

DOI: 10.2196/47434

PMID: 37594844

PMCID: 10474509

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Mar 21, 2023

Date Accepted: Jun 16, 2023

A Deep Learning Model for the Normalization of Institution Names by Multi-Source Literature Feature Fusion

ABSTRACT

Citation

Copyright