Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Mar 18, 2025
Date Accepted: Feb 27, 2026

The final, peer-reviewed published version of this preprint can be found here:

A Fine-Tuned Multimodal AI Chatbot for Dietary Health and Nutrition, Purrfessor: Development and Mixed Methods Evaluation

Lu L, Deng Y, Tian C, Yang S, Shah D

A Fine-Tuned Multimodal AI Chatbot for Dietary Health and Nutrition, Purrfessor: Development and Mixed Methods Evaluation

JMIR AI 2026;5:e74111

DOI: 10.2196/74111

PMID: 42061226

Purrfessor: A Fine-tuned Multimodal LLaVA Diet Health Chatbot

  • Linqi Lu; 
  • Yifan Deng; 
  • Chuan Tian; 
  • Sijia Yang; 
  • Dhavan Shah

ABSTRACT

Background:

The integration of Large Language-and-Vision Assistant (LLaVA) models with food and nutrition data enables multimodal meal analysis and contextual dietary guidance. However, little is known about how anthropomorphic chatbot design and AI-driven meal analysis influence user engagement and perception in health-related contexts.

Objective:

This study introduces Purrfessor, an innovative AI chatbot designed to provide personalized dietary guidance through interactive, multimodal engagement. The chatbot aims to deliver real-time, evidence-based support for food choices while examining the impact of anthropomorphism on user interaction and perception.

Methods:

The Purrfessor chatbot was trained using a combination of the FoodData Central database from the USDA, the Recipe2img dataset featuring food images and corresponding recipes, a curated human-annotated dataset derived from Recipe1M, and customized Q&A dialogue dataset. Two studies were conducted to evaluate chatbot performance and user experience. First, a simulation assessment using GPT-4 and human validation examined the accuracy and descriptive capabilities of the fine-tuned LLaVA model. Second, in-depth interviews (N = 10) were conducted to explore user perceptions of Purrfessor, focusing on its effectiveness, engagement, and usability.

Results:

The simulation study demonstrated that the fine-tuned LLaVA chatbot achieved a mean cosine similarity score of 0.78 (SD = 0.12) in semantic alignment with GPT-4 annotations, suggesting strong consistency in dietary image interpretation. Error analysis of low-scoring cases (n = 100) revealed current limitations, including ambiguity (25%), omissions (20%), and hallucinations (12%). Human validation scores indicated high chatbot performance across correctness (M = 7.87), relevance (M = 9.4), clarity (M = 9.6), and handling of edge cases (M = 9.0), with strong inter-rater reliability (Krippendorff’s α = 0.85–0.96). In-depth interviews identified three primary factors driving user engagement: responsiveness, personalization, and interaction guidance. Anthropomorphic cat persona applied in chatbot system can increase user interest and bonding, aligning with media equation theory and attachment theory in human-AI interaction.

Conclusions:

Findings highlight the role of anthropomorphic chatbot design and multimodal AI in improving user experience in diet health conversation. This study offers an example of AI-driven, evidence-based dietary guidance and underscores the potential of health chatbots to nudge informed health decision-making. Insights contribute to the development of digital health interventions and personalized health communication strategies, with implications for the design of engaging, user-centered AI health assistants.


 Citation

Please cite as:

Lu L, Deng Y, Tian C, Yang S, Shah D

A Fine-Tuned Multimodal AI Chatbot for Dietary Health and Nutrition, Purrfessor: Development and Mixed Methods Evaluation

JMIR AI 2026;5:e74111

DOI: 10.2196/74111

PMID: 42061226

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.