Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Feb 1, 2025
Date Accepted: Apr 30, 2025
MenstLLaMA: A Specialized Large Language Model for Menstrual Health Education in India
ABSTRACT
Background:
The quality and accessibility of menstrual health education in developing nations, including India, remain inadequate due to challenges such as poverty, social stigma, and gender inequality. While community-driven initiatives aim to raise awareness, artificial intelligence (AI) offers a scalable solution for disseminating accurate information. However, existing general-purpose large language models (LLMs) are ill-suited for this task, suffering from low accuracy, cultural insensitivity, and overly complex responses. To address these limitations, we developed MenstLLaMA, a specialized LLM tailored to the Indian context, designed to deliver menstrual health education empathetically, supportively, and accessible.
Objective:
To develop and evaluate MesnstLLaMA, a specialized LLM tailored to deliver accurate, culturally sensitive menstrual health education, and to assess its effectiveness compared to existing general-purpose models.
Methods:
We curated a novel, domain-specific dataset and benchmarked state-of-the-art LLMs to develop MenstLLaMA, an empathic companion model. The evaluation employed an open-label benchmark design with a four-stage framework: (1) overlap with ground truth, (2) clinical relevance, (3) response diversity, and (4) user satisfaction. A panel of clinical experts (N=1,18) conducted expert evaluations, while participants (N=1,200) interacted with chatbots, including MenstLLaMA, in 15–20-minute randomized sessions for user satisfaction assessment.
Results:
MenstLLaMA was compared against state-of-the-art general-purpose LLMs such as GPT-4o, Claude-3, and Mistral using automated and human-based metrics. MenstLLaMA achieved the highest BLEU score (0.059) and BERTScore (0.911), outperforming competitors without requiring few-shot learning. Clinical experts consistently rated its responses superior to gold-standard answers. User case studies revealed high ratings in Understandability (4.7/5) and Relevance (4.3/5), with a moderate rating in Context Sensitivity (3.9/5).
Conclusions:
MenstLLaMA demonstrates exceptional accuracy, empathy, and user satisfaction in menstrual health education, bridging critical gaps left by general-purpose LLMs. Its potential for integration into broader health education platforms positions it as a transformative tool for menstrual well-being. Future research may explore its long-term impact on public perception and menstrual hygiene practices.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.