JMIR Preprints #88053: Increasing LLM Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: A Validation Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Increasing LLM Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: A Validation Study

Marvin Kopka;
Markus A. Feufel

ABSTRACT

Background:

Current prompting techniques for large language models (LLMs) such as ChatGPT mainly focus on well-structured, low-uncertainty problems, yet many real-world tasks (e.g., care-seeking decisions) are ill-defined and involve high uncertainty. Naturalistic decision-making (NDM) specifically analyzes how humans make accurate decisions in such settings, but NDM concepts have not yet been applied to LLM prompt engineering and evaluated.

Objective:

This study aimed to determine whether prompting strategies inspired by NDM (specifically based on recognition-primed decision-making and the data/frame theory) can improve LLM performance in a real-world, high-uncertainty task such as making care-seeking decisions.

Methods:

We evaluated six ChatGPT models (GPT-4o, GPT-4.1, GPT-4.1 mini, o3, o4 mini, and o4 mini high) using three prompting strategies: a default prompt solely asking the LLMs to classify the case vignettes, a recognition-primed prompt tasking the models to reason according to recognition-primed decision-making, and a data/frame prompt tasking the models to apply the data/frame theory. The task was taken from a standardized and validated evaluation framework and instructed the LLMs to advise on the appropriate care-seeking action for 45 real patient case vignettes on three urgency levels (emergency, non-emergency, self-care). Each model-vignette-prompt combination was tested ten times to assess and account for output variability. Accuracy was analyzed using mixed-effects logistic regression. Additionally, we evaluated accuracy on each urgency level and examined output variability.

Results:

Both NDM-inspired prompts increased overall model accuracy (recognition-primed: 70.2%, data/frame: 70.1%) compared to the default prompt (default: 64.7%). The greatest improvements were observed for self-care recommendations, where accuracy increased from 18.5% (default prompt) to 37.6% (recognition-primed prompt) and 33.3% (data/frame prompt). Performance on emergency and non-emergency cases remained high across all prompts. Notably, NDM-inspired prompts made non-reasoning models start giving self-care advice, even though they rarely or never provided self-care advice with the default prompt. Output variability was similar across the three prompts.

Conclusions:

Using LLMs with prompts inspired by NDM, which are designed to reflect real-world human reasoning, improves accuracy of LLMs in care-seeking tasks, particularly for self-care advice, without reducing performance in emergency or non-emergency cases. These findings indicate that NDM-inspired prompts can offer an advantage when LLMs are used for real-world decisions involving ambiguity and uncertainty. The impact of output that reflects real-world human reasoning on users’ decision-making must be evaluated in future studies.

Citation

Please cite as:

Kopka M, Feufel MA

Increasing Large Language Model Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: Validation Study

JMIR Biomed Eng 2026;11:e88053

DOI: 10.2196/88053

PMID: 41950369

PMCID: 13061108

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Biomedical Engineering

Date Submitted: Nov 18, 2025

Date Accepted: Mar 4, 2026

Increasing LLM Accuracy for Care-Seeking Advice Using Prompts Reflecting Human Reasoning Strategies in the Real World: A Validation Study

ABSTRACT

Citation