Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Purrfessor: A Fine-tuned Multimodal LLaVA Diet Health Chatbot
ABSTRACT
Background:
The integration of Large Language-and-Vision Assistant (LLaVA) models with food and nutrition data enables multimodal meal analysis and contextual dietary guidance. However, little is known about how anthropomorphic chatbot design and AI-driven meal analysis influence user engagement and perception in health-related contexts.
Objective:
This study introduces Purrfessor, an innovative AI chatbot designed to provide personalized dietary guidance through interactive, multimodal engagement. The chatbot aims to deliver real-time, evidence-based support for food choices while examining the impact of anthropomorphism on user interaction and perception.
Methods:
The Purrfessor chatbot was trained using a combination of the FoodData Central database from the USDA, the Recipe2img dataset featuring food images and corresponding recipes, a curated human-annotated dataset derived from Recipe1M, and customized Q&A dialogue dataset. Two studies were conducted to evaluate chatbot performance and user experience. First, a simulation assessment using GPT-4 and human validation examined the accuracy and descriptive capabilities of the fine-tuned LLaVA model. Second, in-depth interviews (N = 10) were conducted to explore user perceptions of Purrfessor, focusing on its effectiveness, engagement, and usability.
Results:
The simulation study demonstrated that the fine-tuned LLaVA chatbot achieved a mean cosine similarity score of 0.78 (SD = 0.12) in semantic alignment with GPT-4 annotations, suggesting strong consistency in dietary image interpretation. Error analysis of low-scoring cases (n = 100) revealed current limitations, including ambiguity (25%), omissions (20%), and hallucinations (12%). Human validation scores indicated high chatbot performance across correctness (M = 7.87), relevance (M = 9.4), clarity (M = 9.6), and handling of edge cases (M = 9.0), with strong inter-rater reliability (Krippendorff’s α = 0.85–0.96). In-depth interviews identified three primary factors driving user engagement: responsiveness, personalization, and interaction guidance. Anthropomorphic cat persona applied in chatbot system can increase user interest and bonding, aligning with media equation theory and attachment theory in human-AI interaction.
Conclusions:
Findings highlight the role of anthropomorphic chatbot design and multimodal AI in improving user experience in diet health conversation. This study offers an example of AI-driven, evidence-based dietary guidance and underscores the potential of health chatbots to nudge informed health decision-making. Insights contribute to the development of digital health interventions and personalized health communication strategies, with implications for the design of engaging, user-centered AI health assistants.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.