Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Oct 30, 2025
Date Accepted: Feb 18, 2026
Date Submitted to PubMed: Feb 20, 2026
Understanding User Intent in Code-Mixed Sexual and Reproductive Health Queries in Urban India: A Hierarchical Classification Approach using LLMs
ABSTRACT
Background:
Access to knowledge about sexual and reproductive health (SRH) remains stigmatized and taboo in many parts of the globe. In the Global South, information delivery is further complicated by linguistic and cultural diversity. For instance, in India (our study context), urban Hindi-speaking users frequently type text in Hinglish (code-mixed Hindi and English written in the Latin script) and use colloquial language to describe SRH concerns. Large language models (LLMs) could help answer SRH questions, but most systems are trained for English and struggle with code-mixed text and understanding cultural context. Our research aims to address this gap by focusing on the current state of LLMs in understanding user intent in SRH queries, for a low-resource language.
Objective:
This study evaluates the effectiveness of proprietary, multilingual open-weight, and Indic LLMs in zero-shot settings for identifying user intent in code-mixed Hinglish SRH queries. Our aim is to measure how well LLMs assign the correct label in a two-level hierarchical classification (topic then subtopic). We take a hierarchical approach because SRH queries are complex and context-dependent; flat labels could obscure clinically important distinctions and can lead to misdirected guidance. We also characterize common error types that drive misclassification.
Methods:
We analyzed 4,161 de-identified questions about SRH in Hinglish (Hindi written in Latin script), collected by our partner nonprofit health organization (Myna Mahila Foundation) in an underserved community in urban Mumbai. Queries were annotated into 8 topics and 40 subtopics using a hierarchical framework that captured linguistic, cultural, and context variation. We compared the performance of proprietary, multilingual open-weight, and Indic-specific LLMs in zero-shot settings. Performance was measured using hierarchical F1(hF1), exact match, and topic- and subtopic-level accuracy.
Results:
Proprietary models achieved the strongest results, with GPT-5 performing best overall (hF1 = 0.784). Among open-weight systems, Sarvam-M emerged as the top-performing Indic model (hF1 = 0.757), ranking just below proprietary models and even surpassing Claude-3.5-Sonnet (0.745) as well as large multilingual systems such as LLaMA-3.3-70B-Instruct (0.742) and Gemma-3-27B-IT (0.739). Other Indic models performed considerably lower (e.g., LLaMA3-Gaja-Hindi-8B, 0.596; Krutrim-2-Instruct, 0.558; Airavata, 0.404). Smaller multilingual open-weights models––including Mixtral-8x7B-Instruct (0.593), LLaMA-3.1-8B-Instruct (0.630), Gemma-2-9B-IT (0.657)––consistently outperformed them, showing that parameter size alone does not explain performance gaps. While models generally captured broad topical intent, they frequently failed at fine-grained intent recognition, especially with euphemisms, colloquial expressions, and local and culturally situated questions.
Conclusions:
Hierarchical classification revealed persistent gaps in how LLMs handle code-mixed queries. Proprietary models performed best, but Sarvam-M shows that open-weight Indic systems can achieve near–state-of-the-art performance when supported by robust training data and cultural adaptation. Strengthening such localized fine-tuned models is essential for developing culturally informed, linguistically inclusive AI tools that can expand equitable access to SRH information in underserved populations globally.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.