Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR AI

Date Submitted: Feb 4, 2026

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Diabetes-Specialized Large Language Models for Clinical Reasoning and Dietary Recommendation: Reflection- and Curriculum-Based Instruction Tuning Study

  • Jaesung Hwang; 
  • Deniise Liz Namayanja; 
  • Donghyeon Park

ABSTRACT

Background:

Effective diabetes management requires continuous interpretation of glycemic trends, personalized dietary guidance, and sustained patient education. Although large language models are increasingly explored for health-related applications, existing general-purpose and biomedical models often struggle with diabetes-specific reasoning and instruction-following, limiting their reliability for domain-focused tasks such as clinical question answering and dietary recommendation.

Objective:

The objective of this study was to develop and evaluate a diabetes-specialized large language model optimized for diabetes-specific reasoning, instruction-following, and dietary recommendation tasks.

Methods:

This study was a model development and benchmark evaluation study. We propose a model-centric instruction refinement framework that integrates two reflection-based metrics; Instruction-Following Difficulty (IFD) and reversed Instruction-Following Difficulty (r-IFD), to identify and replace suboptimal instruction–response pairs during instruction tuning. To improve training stability and reduce catastrophic forgetting, curriculum instruction tuning was applied by sequencing instructions from lower to higher difficulty based on model sensitivity. The resulting diabetes-specialized large language model was evaluated across multiple diabetes-related natural language processing tasks, including question answering (QA), natural language inference (NLI), information extraction (IE), summarization, text generation, and dietary recommendation. Performance was compared with established biomedical large language models using benchmark datasets and simulation-based glycemic evaluation.

Results:

Across multiple diabetes-related benchmark tasks, the proposed model demonstrated improved accuracy in diabetes-focused QA and stronger instruction-following performance in generative tasks compared with established biomedical large language models. For dietary recommendation, the model generated meal plans with higher nutritional quality than those produced by GPT-4, achieving a 0.66% improvement in the Diet Quality Index-International (DQI–I) score. Simulation-based evaluation using simglucose simulator further showed that meal plans generated by the proposed model resulted in reduced postprandial glycemic burden, as measured by lower incremental area under the curve, compared with GPT-4–generated meal plans.

Conclusions:

This study demonstrates that domain-specific instruction tuning can effectively adopt general-purpose Large Language Models for diabetes management. By combining reflection-based instruction replacement with curriculum instruction tuning, the proposed approach enhances instruction-following, reasoning capability, and dietary guidance for diabetes. The results highlight the potential of specialized LLMs to provide more reliable and clinically aligned support for diabetes-related decision-making and self-management, offering a promising direction for safe and effective AI deployment in chronic disease care.


 Citation

Please cite as:

Hwang J, Namayanja DL, Park D

Diabetes-Specialized Large Language Models for Clinical Reasoning and Dietary Recommendation: Reflection- and Curriculum-Based Instruction Tuning Study

JMIR Preprints. 04/02/2026:92843

DOI: 10.2196/preprints.92843

URL: https://preprints.jmir.org/preprint/92843

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.