Currently submitted to: Journal of Medical Internet Research
Date Submitted: Jun 24, 2026
Open Peer Review Period: Jun 25, 2026 - Aug 20, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Sovereign Language Models for Regional Health Research under the European Health Data Space
ABSTRACT
Several major artificial intelligence (AI) policy frameworks have come into effect across Europe. The European Health Data Space (EHDS) Regulation is being implemented across member states, the EU AI Act is moving from text to enforcement, and national strategies on AI in healthcare are emerging alongside them. Each raises the same two questions: where does health data sit, and who controls it. Most published work on large language models (LLMs) in clinical research describes systems built inside well-resourced academic medical centres on commercial cloud infrastructure. Regional health services and resource-constrained academic settings are underrepresented in this literature, even though they hold the majority of European longitudinal clinical data, much of it unstructured text on which little patient-level analysis has been done. A further vulnerability has recently become visible. Reliance on externally controlled frontier models means that access to capability can be constrained or withdrawn by commercial and political decisions taken outside the institution. We argue that sovereign, on-premise LLM infrastructure offers a practical and realistic alternative. Sovereignty is defined not by a vendor or model, but by deployment characteristics: inference runs under institutional control, patient-level data remains within the originating organisation, and participation in wider research occurs through federation rather than data transfer. We describe an architecture combining local inference, OMOP standardisation, federated analytics through OHDSI and EHDEN, and a tiered governance framework. We examine the concerns commonly raised about LLM-assisted research, distinguishing those that sovereign deployment addresses directly, those it partially mitigates, and those it does not solve. We argue that the convergence of open-weight models, maturing federated research ecosystems, and European policy frameworks creates a distinctive opportunity for regional institutions to participate in modern AI-enabled research while preserving data sovereignty and continuity of access. The central question is no longer whether such systems can be built, but whether institutions, funders, and research networks are prepared to support their adoption.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.