Currently submitted to: JMIR AI
Date Submitted: Jan 30, 2026
Open Peer Review Period: Feb 17, 2026 - Apr 14, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Auditing Nutritional and Behavioral Risk in Large Language Model–Generated Dietary Recommendations: A Population-Aware Content Analysis
ABSTRACT
Background:
Large language models (LLMs) are increasingly embedded in digital health applications and consumer-facing dietary guidance systems. While these systems offer scalable and personalized nutrition support, inappropriate dietary recommendations may pose nutritional or behavioral risks, particularly for vulnerable populations with population-specific dietary constraints. However, systematic and scalable approaches for evaluating the safety of LLM-generated dietary recommendations remain limited.
Objective:
The objective of this study was to develop and evaluate a reproducible, population-aware auditing framework to quantify nutritional and behavioral risk in LLM-generated dietary recommendations across diverse user profiles, dietary goals, and response tones.
Methods:
We conducted a content-level audit of 2,464 dietary recommendations generated by a large language model using a full-factorial prompt design that varied user profiles, dietary goals, and response tones. Nutritional information, including daily energy intake and macronutrient distributions, was automatically extracted from generated texts. Population-specific nutritional thresholds derived from international guidelines were applied to assess nutritional risk. Behavioral risk was evaluated using a lexicon-based analysis of potentially unsafe dietary framings. Nutritional and behavioral components were integrated into a continuous composite risk score, enabling large-scale statistical analysis and subgroup comparisons.
Results:
Across all 2,464 recommendations, composite risk scores were generally low (median 0.008; mean approximately 0.02), indicating broad alignment with evidence-based nutritional thresholds. However, a pronounced long-tail distribution was observed. Elevated risk scores occurred disproportionately in sensitive populations, particularly pregnant individuals requiring glycemic control, with maximum observed values reaching approximately 0.17. Increased risk was driven by both population-specific nutritional deviations and the presence of potentially unsafe behavioral framings. Permissive response tones were associated with slightly higher risk levels than neutral, evidence-based tones.
Conclusions:
Most LLM-generated dietary recommendations appear nutritionally safe for general populations, but systematic long-tail risks persist for vulnerable groups. The proposed population-aware auditing framework enables scalable safety evaluation of generative dietary guidance and provides continuous risk signals that can support benchmarking, red-teaming, and the development of adaptive safeguards in digital health applications. Clinical Trial: Not applicable
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.