JMIR Preprints #84444: Large Language Model-Generated Patient Instructions for Prescriptions in Primary Health Care: A Preclinical Evaluation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Large Language Model-Generated Patient Instructions for Prescriptions in Primary Health Care: A Preclinical Evaluation

Zilma Silveira Nogueira Reis;
Elisa Tuler Albergaria;
Adriana Silvina Pagano;
Eura Martins Lage;
Flávia Ribeiro de Oliveira;
Cristiane dos Santos Dias;
Juliana Almeida Oliveira;
Gláucia Miranda Varella Pereira;
Isaias Jose Ramos de Oliveira;
Érico Franco Mineiro;
Igor Carvalho Lima Oliveira;
Davi dos Reis de Jesus;
Antônio Pereira de Souza Júnior;
Igor de Carvalho Gomes;
Rodrigo André Cuevas Gaete;
Ricardo Cruz-Correia;
Leonardo Chaves Dutra da Rocha

ABSTRACT

Background:

Large Language Model-Generated Patient Instructions for Prescriptions in Primary Health Care: A Preclinical Evaluation

Objective:

We evaluated Large Language Models (LLMs) performance in generating medication usage instructions to complement prescriptions in Primary Health Care.

Methods:

This randomized, blinded experimental study utilized prescription-inducing scenarios, assigned to 62 healthcare professionals, to validate instructions generated by LLMs during e-prescriptions. The instructions were generated by ChatGPT-4.0, Llama3.1-8B, and Llama3.1-8B-RAG using Retrieval-Augmented Generation (RAG) based on patient information leaflets. Performance metrics assessed Adequacy, Completeness, Clarity, Personalization, Usefulness, and errors in the generated instructions, with scores to analyse overall and individual metrics, using all evaluations (n=198) and consensus among evaluators by test (n=46).

Results:

The three models yielded similar scores for producing qualified instructions, by consensus among evaluators (n=46 tests), with median (IQR) values of: ChatGPT-4.0: 89.3 (12.5), Llama3.1-8B: 79.5 (46.1), and Llama3.1-8B-RAG: 85.7 (21.9), P=.282. RAG rendered Llama3.1-8B model equivalent to ChatGPT-4.0 regarding Adequacy, Completeness, Clarity, and Usefulness, and presented fewer errors in the generated instructions: ChatGPT-4.0 (n=5), Llama3.1-8B (n=11), and Llama3.1-8B-RAG (n=4), P=.040. Concerning specific criteria across 198 tests, Llama3.1-8B-RAG received scores equivalent to those of ChatGPT-4.0 in Adequacy with mean (SD) 6.24 (2.3) and 6.82 (2.1), respectively, P=.536); Completeness with mean (SD) 5.94 (2.2) and 6.55 (1.8), respectively, P=.376; Clarity with mean (SD) 5.77 (2.4) and 6.68 (1.9), respectively, P=.086; as well as Usefulness with mean (SD) 5.42 (2.4) and 5.96 (2.2), respectively, P=.627. ChatGPT-4.0 received higher scores in the Personalization criterion with mean (SD) 7.05 (1.5) in comparison with 5.44 (2.6) Llama3.1-8B-RAG, P<.001.

Conclusions:

The open-source LLM enhanced with external information presenting similar performance to the closed-source model. LLM-generation demonstrated potential for instructing patients on medication use. Nonetheless, the introduction of this innovation into the e-prescribing workflow demands prescriber validation and LLM performance governance.

Citation

Please cite as:

Silveira Nogueira Reis Z, Tuler Albergaria E, Silvina Pagano A, Martins Lage E, Ribeiro de Oliveira F, dos Santos Dias C, Almeida Oliveira J, Miranda Varella Pereira G, Jose Ramos de Oliveira I, Franco Mineiro �, Carvalho Lima Oliveira I, dos Reis de Jesus D, Pereira de Souza Júnior A, de Carvalho Gomes I, André Cuevas Gaete R, Cruz-Correia R, Chaves Dutra da Rocha L

Large Language Model–Generated Patient Instructions for Prescriptions in Primary Health Care: Preclinical Algorithm Validation

J Med Internet Res 2026;28:e84444

DOI: 10.2196/84444

PMID: 42190235

PMCID: 13250494

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 19, 2025

Open Peer Review Period: Sep 19, 2025 - Nov 14, 2025

Date Accepted: Feb 27, 2026

(closed for review but you can still tweet)

Large Language Model-Generated Patient Instructions for Prescriptions in Primary Health Care: A Preclinical Evaluation

ABSTRACT

Citation

Copyright