JMIR Preprints #77149: Leveraging Large Language Models to Improve Readability of German Online Medical Texts: An Evaluation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Leveraging Large Language Models to Improve Readability of German Online Medical Texts: An Evaluation Study

Amela Miftaroski;
Richard Zowalla;
Martin Wiesner;
Monika Pobiruchin

ABSTRACT

Background:

Patient education materials (PEMs) found online are often written at a complexity level too high for the average reader, which can hinder understanding and informed decision-making. Large Language Models (LLMs) may offer a solution by simplifying complex medical texts. To date, little is known about how well LLMs can handle simplification tasks for German-language PEMs.

Objective:

The study investigates whether LLMs can increase the readability of German online medical texts to a recommended level.

Methods:

A sample of 60 German texts originating from online medical resources was compiled. To improve readability of these texts, four LLMs were selected and used for text simplification: ChatGPT-3.5, ChatGPT-4o, Microsoft Copilot, and Le Chat. Next, readability scores (FRE and 4th Vienna Formula; WSTF) of the original texts were computed and compared to the rephrased LLM versions. Student’s t-Test for paired samples was used to test the reduction of readability scores, ideally to or lower than the 8th grade level.

Results:

Most of the original texts were rated as “difficult” to “quite difficult” (average WSTF=11.24, FRE=35.92). On average the LLM scores were: (i) ChatGPT-3.5: WSTF= 9.96, FRE=45.04, (ii) ChatGPT-4o: WSTF=10.6, FRE=39.23, (iii) Microsoft Copilot: WSTF=8.99, FRE=49.0, and (iv) Le Chat: WSTF=11.71, FRE=33.72. Even though the readability improved, the t-test yielded no statistically significant result.

Conclusions:

LLMs can improve the readability of PEMs in German language. This moderate improvement can support patients reading PEMs online. LLMs demonstrated their potential to make complex online medical text more accessible to a broader audience by increasing readability. This is the first study to evaluate this for German online medical texts.

Citation

Please cite as:

Miftaroski A, Zowalla R, Wiesner M, Pobiruchin M

Leveraging Large Language Models to Improve the Readability of German Online Medical Texts: Evaluation Study

JMIR AI 2026;5:e77149

DOI: 10.2196/77149

PMID: 41575871

PMCID: 12829587

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: May 8, 2025

Open Peer Review Period: May 8, 2025 - Jul 3, 2025

Date Accepted: Nov 26, 2025

(closed for review but you can still tweet)

Leveraging Large Language Models to Improve Readability of German Online Medical Texts: An Evaluation Study

ABSTRACT

Citation

Copyright