Currently submitted to: Journal of Medical Internet Research
Date Submitted: Jan 29, 2026
Open Peer Review Period: Feb 2, 2026 - Mar 30, 2026
(closed for review but you can still tweet)
NOTE: This is an unreviewed Preprint
Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).
Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.
Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).
Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.
Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.
Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Enhancing Healthcare Interoperability Using Large Language Models: A Generative Proof-of-Concept Framework to Extract Medical Information from Unstructured Clinical Text
ABSTRACT
Background:
Unstructured clinical text remains a major barrier to interoperable data reuse and large-scale secondary analysis in healthcare. Large language models (LLMs) have the potential to automate the extraction of structured clinical information; however, their application is limited by the scarcity of high-quality annotated training data.
Objective:
-
Methods:
We evaluated an LLM–based pipeline for extracting structured clinical information from cancer-related discharge letters and mapping it to representations compatible with Fast Healthcare Interoperability Resources (FHIR). To enable large-scale supervised training, we developed a random sample generator that creates synthetic discharge letters using Qwen3 235B by randomly sampling and aggregating structured FHIR data from 41,175 cancer patients. The resulting synthetic discharge letters (n=75k) were paired with their originating structured data, forming a large-scale dataset for fine-tuning MedGemma 27B. Evaluation was conducted on the synthetic test dataset (n=7,500), real-world discharge letters (n=30) which are evaluated by physicians and a medical student, and a comparative one-shot approach using open-source models (Qwen3, LLaMA, and GPT-OSS).
Results:
The fine-tuned model achieved high extraction performance across multiple clinical entities, including full ICD diagnosis codes (F1 = 0.84), tumor-related information (0.99), laboratory values (0.99), medication names and dosages (0.99), and ATC medication codes (0.94). Extraction of procedure-related information was more challenging but remained reliable, with F1 scores of 0.63 for OPS codes and 0.90 for procedure descriptions. In a one-shot comparison of general-purpose LLMs with the fine-tuned model, the fine-tuned model consistently outperformed general-purpose LLMs in nearly all extraction categories. When applied to real-world discharge letters, performance remained robust, with F1 scores of 78.9% for ICD diagnoses, 86.1% for tumor-related information, 93% for medications, and 61.3% for procedures.
Conclusions:
These results demonstrate that synthetic text generation from structured clinical data enables effective and scalable training of LLMs for extracting interoperable, multi-entity clinical information from unstructured documentation.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.