Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 20, 2025
Date Accepted: Jan 23, 2026
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Development and Evaluation of SNOMED CT Automated Mapping Tool: Advancing Terminology Standardization and Semantic Interoperability
ABSTRACT
Background:
Kakao Healthcare has built the Healthcare data Research Suite (HRS) to enable multi‑institutional research, where semantic interoperability depends on clinical terminology standardization. Manual mapping of local terms to Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) is labor‑intensive, inconsistent across sites, and poorly suited to bilingual data and institution‑specific granularity.
Objective:
We designed a large language model (LLM)‑based tool to support multi‑site research by standardizing terminology, combining automated SNOMED CT mapping and new‑concept authoring via post‑coordination in one process.
Methods:
The automated mapping pipeline included preprocessing local terms, syntactic and vector similarity mapping leveraging LLM-based embeddings, and iterative enrichment based on validated results. Translation and semantic representation used GPT-4o and Gemini. New concepts were authored through a structured post-coordination process. Performance was evaluated using diagnostic and surgical procedural terms from four major hospital networks (nine university hospitals) in South Korea, with clinical terminologists providing usability feedback on system results.
Results:
Using reference terms, Top-5 accuracy for diagnostic mapping reached 98.67%, 89.69%, 98.52%, and 92.78% across the four institutions, and 99.19%, 82.62%, 98.73%, and 84.69% for surgical procedural mapping. Implementation of the tool reduced manual mapping rates by 30% and overall manual workload by up to 90%. Average mapping and validation times per item decreased by 75–90%.
Conclusions:
The tool improves mapping accuracy, efficiency, and scalability in terminology standardization, but challenges remain with ambiguous terms, granularity gaps, and abbreviation resolution. Future work will focus on integrating external abbreviation databases, Retrieval-Augmented Generation techniques, and ontology-driven modeling to further advance autonomous mapping performance and quality assurance.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.