JMIR Preprints #105453: Sovereign Language Models for Regional Health Research under the European Health Data Space

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Sovereign Language Models for Regional Health Research under the European Health Data Space

Azrin Muslim;
Nur Atikah Mohd Asri;
Declan Lyons

ABSTRACT

Several major artificial intelligence (AI) policy frameworks have come into effect across Europe. The European Health Data Space (EHDS) Regulation is being implemented across member states, the EU AI Act is moving from text to enforcement, and national strategies on AI in healthcare are emerging alongside them. Each raises the same two questions: where does health data sit, and who controls it. Most published work on large language models (LLMs) in clinical research describes systems built inside well-resourced academic medical centres on commercial cloud infrastructure. Regional health services and resource-constrained academic settings are underrepresented in this literature, even though they hold the majority of European longitudinal clinical data, much of it unstructured text on which little patient-level analysis has been done. A further vulnerability has recently become visible. Reliance on externally controlled frontier models means that access to capability can be constrained or withdrawn by commercial and political decisions taken outside the institution. We argue that sovereign, on-premise LLM infrastructure offers a practical and realistic alternative. Sovereignty is defined not by a vendor or model, but by deployment characteristics: inference runs under institutional control, patient-level data remains within the originating organisation, and participation in wider research occurs through federation rather than data transfer. We describe an architecture combining local inference, OMOP standardisation, federated analytics through OHDSI and EHDEN, and a tiered governance framework. We examine the concerns commonly raised about LLM-assisted research, distinguishing those that sovereign deployment addresses directly, those it partially mitigates, and those it does not solve. We argue that the convergence of open-weight models, maturing federated research ecosystems, and European policy frameworks creates a distinctive opportunity for regional institutions to participate in modern AI-enabled research while preserving data sovereignty and continuity of access. The central question is no longer whether such systems can be built, but whether institutions, funders, and research networks are prepared to support their adoption.

Citation

Please cite as:

Muslim A, Mohd Asri NA, Lyons D

Sovereign Language Models for Regional Health Research under the European Health Data Space

JMIR Preprints. 24/06/2026:105453

DOI: 10.2196/preprints.105453

URL: https://preprints.jmir.org/preprint/105453

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Jun 24, 2026

Open Peer Review Period: Jun 25, 2026 - Aug 20, 2026

(currently open for review)

Sovereign Language Models for Regional Health Research under the European Health Data Space

ABSTRACT

Citation

Copyright