JMIR Preprints #62924: Use of SNOMED CT in Large Language Models: A Scoping Review

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Use of SNOMED CT in Large Language Models: A Scoping Review

Eunsuk Chang;
Sumi Sung

ABSTRACT

Background:

SNOMED CT serves as a widely adopted standardized terminology in electronic health records and common data models, garnering attention for its secondary applications as a biomedical knowledge source. While large language models commonly face "hallucination" challenges, integrating SNOMED CT as a knowledge base with LLMs has been proposed to improve natural language understanding and generation in the biomedical domain.

Objective:

We aimed to review the state-of-the-art methodologies for incorporating SNOMED CT into LLMs to enhance biomedical natural language understanding and generation tasks.

Methods:

A comprehensive review of SNOMED CT integration in language models was conducted by querying ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for publications between 2018 and 2023. Thirty-seven papers were selected for the final review.

Results:

BERT and its fine-tuning variants were the mainstream baseline language models in the examined literature. The majority of studies (n=28) incorporated SNOMED CT contents, such as descriptions, relations, and entity types (classes), into the inputs of large language models or training corpora. Other approaches included incorporating SNOMED CT into additional fusion modules of language models or retrieving knowledge from SNOMED CT for inference. SNOMED CT-integrated large language models prevailed in natural language understanding tasks (n=30) such as entity typing, classification, and, most notably, medical concept normalization. The integrated models also encompassed natural language generation tasks (n=9), such as translation, summarization, and question answering. However, only a small number of studies reported performance differences before and after the SNOMED CT integration.

Conclusions:

As the utilization of SNOMED CT as a reliable knowledge source becomes more feasible, SNOMED CT-integrated language models hold the potential to warrant model accountability, demonstrating advancements in the tasks of comprehending and generating NL for downstream tasks in the biomedical realm. Future research is anticipated to be more cognizant of the advantage of incorporating SNOMED CT into large language models.

Citation

Please cite as:

Chang E, Sung S

Use of SNOMED CT in Large Language Models: Scoping Review

JMIR Med Inform 2024;12:e62924

DOI: 10.2196/62924

PMID: 39374057

PMCID: 11494256

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jun 4, 2024

Date Accepted: Sep 15, 2024

Use of SNOMED CT in Large Language Models: A Scoping Review

ABSTRACT

Citation

Copyright