Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: May 8, 2025
Open Peer Review Period: May 8, 2025 - Jul 3, 2025
Date Accepted: Nov 26, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Leveraging Large Language Models to Improve the Readability of German Online Medical Texts: Evaluation Study

Miftaroski A, Zowalla R, Wiesner M, Pobiruchin M

Leveraging Large Language Models to Improve the Readability of German Online Medical Texts: Evaluation Study

JMIR AI 2026;5:e77149

DOI: 10.2196/77149

PMID: 41575871

PMCID: 12829587

Leveraging Large Language Models to Improve Readability of German Online Medical Texts: An Evaluation Study

  • Amela Miftaroski; 
  • Richard Zowalla; 
  • Martin Wiesner; 
  • Monika Pobiruchin

ABSTRACT

Background:

Patient education materials (PEMs) found online are often written at a complexity level too high for the average reader, which can hinder understanding and informed decision-making. Large Language Models (LLMs) may offer a solution by simplifying complex medical texts. To date, little is known about how well LLMs can handle simplification tasks for German-language PEMs.

Objective:

The study investigates whether LLMs can increase the readability of German online medical texts to a recommended level.

Methods:

A sample of 60 German texts originating from online medical resources was compiled. To improve readability of these texts, four LLMs were selected and used for text simplification: ChatGPT-3.5, ChatGPT-4o, Microsoft Copilot, and Le Chat. Next, readability scores (FRE and 4th Vienna Formula; WSTF) of the original texts were computed and compared to the rephrased LLM versions. Student’s t-Test for paired samples was used to test the reduction of readability scores, ideally to or lower than the 8th grade level.

Results:

Most of the original texts were rated as “difficult” to “quite difficult” (average WSTF=11.24, FRE=35.92). On average the LLM scores were: (i) ChatGPT-3.5: WSTF= 9.96, FRE=45.04, (ii) ChatGPT-4o: WSTF=10.6, FRE=39.23, (iii) Microsoft Copilot: WSTF=8.99, FRE=49.0, and (iv) Le Chat: WSTF=11.71, FRE=33.72. Even though the readability improved, the t-test yielded no statistically significant result.

Conclusions:

LLMs can improve the readability of PEMs in German language. This moderate improvement can support patients reading PEMs online. LLMs demonstrated their potential to make complex online medical text more accessible to a broader audience by increasing readability. This is the first study to evaluate this for German online medical texts.


 Citation

Please cite as:

Miftaroski A, Zowalla R, Wiesner M, Pobiruchin M

Leveraging Large Language Models to Improve the Readability of German Online Medical Texts: Evaluation Study

JMIR AI 2026;5:e77149

DOI: 10.2196/77149

PMID: 41575871

PMCID: 12829587

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.