Currently submitted to: Journal of Medical Internet Research
Date Submitted: Jan 21, 2026
Open Peer Review Period: Jan 22, 2026 - Mar 19, 2026
(closed for review but you can still tweet)
NOTE: This is an unreviewed Preprint
Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).
Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.
Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).
Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.
Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.
Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish
ABSTRACT
Background:
Clinical natural language processing (NLP) refers to computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare. The advancement of deep learning, augmented by the recent emergence of transformers, has been pivotal to the success of NLP across various domains. This success is largely attributed to the end-to-end training capabilities of deep learning systems. Further, advances in instruction tuning have enabled Large Language Models (LLMs) like OpenAI’s GPT to perform tasks described in natural language. While these advancements have dramatically improved capabilities in processing languages like English, these benefits are not always equally transferable to under-resourced languages. In this regard, this review aims to provide a comprehensive assessment of the state-of-the-art NLP methods for the mainland Scandinavian clinical text, thereby providing an insightful overview of the landscape for clinical NLP within the region.
Objective:
The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the Scandinavian clinical domain, thereby providing an overview of the landscape for clinical language processing within the Scandinavian languages across Norway, Denmark, and Sweden. Generally, the review aims to provide a practical outline of various modeling options, opportunities, and challenges or limitations, thereby providing a clear overview of existing methodologies and potential avenues for future research and development.
Methods:
A literature search was conducted in various online databases, including PubMed, ScienceDirect, Google Scholar, ACM Digital Library, and IEEE Xplore between December 2022 and March 2024. The search considers peer-reviewed journal articles, preprints, and conference proceedings. Relevant articles were initially identified by scanning titles, abstracts, and keywords, which served as a preliminary filter in conjunction with inclusion and exclusion criteria, and were further screened through a full-text eligibility assessment. Data was extracted according to predefined categories, established from prior studies and further refined through brainstorming sessions among the authors.
Results:
The initial search yielded 217 articles. The full-text eligibility assessment was independently carried out by five of the authors and resulted in 118 studies, which were critically analyzed. Any disagreements among the authors were resolved through discussion. Out of the 118 articles, 17.9% (n=21) focus on Norwegian clinical text, 61% (n=72) on Swedish, 13.5% (n=16) on Danish, and 7.6% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and the rate of adaptation and transfer learning in the region.
Conclusions:
The review presented a comprehensive assessment of the state-of-the-art Clinical NLP in mainland Scandinavian languages and shed light on potential barriers and challenges. The review identified a lack of shared resources, e.g., datasets and pre-trained models, inadequate research infrastructure, and insufficient collaboration as the most significant barriers that require careful consideration in future research endeavors. The review highlights the need for future research in resource development, core NLP tasks, and de-identification. Generally, we foresee that the findings presented will help shape future research directions by shedding some light on areas that require further attention for the rapid advancement of the field in the region
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.