JMIR Preprints #91880: Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish

Ashenafi Zebene Woldaregay;
Jørgen Aarmo Lund;
Phuong Dinh Ngo;
Mariyam Tayefi;
Joel Burman;
Stine Hansen;
Martin Hylleholt Sillesen;
Hercules Dalianis;
Robert Jenssen;
Rolf Ole Lindsetmo;
Karl Øyvind Mikalsen

ABSTRACT

Background:

Clinical natural language processing (NLP) refers to computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare. The advancement of deep learning, augmented by the recent emergence of transformers, has been pivotal to the success of NLP across various domains. This success is largely attributed to the end-to-end training capabilities of deep learning systems. Further, advances in instruction tuning have enabled Large Language Models (LLMs) like OpenAI’s GPT to perform tasks described in natural language. While these advancements have dramatically improved capabilities in processing languages like English, these benefits are not always equally transferable to under-resourced languages. In this regard, this review aims to provide a comprehensive assessment of the state-of-the-art NLP methods for the mainland Scandinavian clinical text, thereby providing an insightful overview of the landscape for clinical NLP within the region.

Objective:

The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the Scandinavian clinical domain, thereby providing an overview of the landscape for clinical language processing within the Scandinavian languages across Norway, Denmark, and Sweden. Generally, the review aims to provide a practical outline of various modeling options, opportunities, and challenges or limitations, thereby providing a clear overview of existing methodologies and potential avenues for future research and development.

Methods:

A literature search was conducted in various online databases, including PubMed, ScienceDirect, Google Scholar, ACM Digital Library, and IEEE Xplore between December 2022 and March 2024. The search considers peer-reviewed journal articles, preprints, and conference proceedings. Relevant articles were initially identified by scanning titles, abstracts, and keywords, which served as a preliminary filter in conjunction with inclusion and exclusion criteria, and were further screened through a full-text eligibility assessment. Data was extracted according to predefined categories, established from prior studies and further refined through brainstorming sessions among the authors.

Results:

The initial search yielded 217 articles. The full-text eligibility assessment was independently carried out by five of the authors and resulted in 118 studies, which were critically analyzed. Any disagreements among the authors were resolved through discussion. Out of the 118 articles, 17.9% (n=21) focus on Norwegian clinical text, 61% (n=72) on Swedish, 13.5% (n=16) on Danish, and 7.6% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and the rate of adaptation and transfer learning in the region.

Conclusions:

The review presented a comprehensive assessment of the state-of-the-art Clinical NLP in mainland Scandinavian languages and shed light on potential barriers and challenges. The review identified a lack of shared resources, e.g., datasets and pre-trained models, inadequate research infrastructure, and insufficient collaboration as the most significant barriers that require careful consideration in future research endeavors. The review highlights the need for future research in resource development, core NLP tasks, and de-identification. Generally, we foresee that the findings presented will help shape future research directions by shedding some light on areas that require further attention for the rapid advancement of the field in the region

Citation

Please cite as:

Woldaregay AZ, Lund JA, Ngo PD, Tayefi M, Burman J, Hansen S, Sillesen MH, Dalianis H, Jenssen R, Lindsetmo RO, Mikalsen K�

Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish

JMIR Preprints. 21/01/2026:91880

DOI: 10.2196/preprints.91880

URL: https://preprints.jmir.org/preprint/91880

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jan 21, 2026

Open Peer Review Period: Jan 22, 2026 - Mar 19, 2026

(closed for review but you can still tweet)

NOTE: This is an unreviewed Preprint

Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish

ABSTRACT

Citation

Copyright