JMIR Preprints #96839: Development and validation of explainable risk assessment system for LLM-driven emotional support chatbot (Emobot)

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development and validation of explainable risk assessment system for LLM-driven emotional support chatbot (Emobot)

Hin Chi Kwok;
Xinyu Feng;
Sam Kiu Lam;
Perlie Chung;
Deborah Lee;
Gregor Stiglic;
Shaowei Guan;
Vivian Hui

ABSTRACT

Background:

Youth mental health needs are substantial: globally, about 1 in 7 adolescents (10–19 years) experience a mental disorder, and suicide is the third leading cause of death among people aged 15–29. Despite need, many young people delay or avoid care due to stigma, privacy concerns, cost, and difficulty accessing timely services. Chat-based support can lower the threshold for disclosure, but safety-critical statements require transparent risk stratification and clear pathways for clinician oversight. Language further affects safety in multilingual communities: Cantonese is comparatively under-resourced for natural language processing, and its colloquial orthography, idioms, and pervasive code-switching can distort risk-cue interpretation when models are trained primarily on English and standard written Chinese.

Objective:

We developed EmoBot, an explainable three-tier (Tier 1–3) risk stratification pipeline and nurse-/counsellor-facing dashboard for Cantonese-speaking youth, and evaluated agreement with expert triage labels and boundary error patterns.

Methods:

EmoBot uses a hybrid framework combining semantic exemplar retrieval and constrained large language model (LLM) reasoning. An expert-authored, de-identified Cantonese reference corpus is embedded with Sentence-BERT and indexed for nearest-neighbor retrieval (top-k=5). In parallel, the DeepSeek API chat model (deepseek-chat; DeepSeek-V3.2) generates a structured record (tier, category, cues, rationale) from the user message, tier/category definitions, and retrieved exemplars as in-context annotated examples. Inference settings were fixed (temperature=0.1; presence_penalty=0; max_tokens capped) and no fine-tuning was performed. If retrieval and LLM disagree, EmoBot outputs the higher tier while displaying provenance-linked evidence in the dashboard to support nurse-in-the-loop verification and escalation. Two sequential validation sets were used (set 1: n=50 pilot stress-test; set 2: n=336 expanded validation; total n=386). Three mental-health–trained experts rated each statement blinded to model output; majority consensus (≥2/3) defined the reference label.

Results:

Human majority consensus was achieved for 382/386 statements (99.0%). On consensus-labeled items (n=382), EmoBot matched expert tiers with 95.8% accuracy (366/382; 95% CI 93.7–97.6) and 95.2% macro F1 (95% CI 92.7–97.4). Performance increased from set 1 (80.9% accuracy) to set 2 (97.9% accuracy). All misclassifications were adjacent-tier (±1) with no Tier 1↔Tier 3 confusions; Tier 3 detection remained high (F1 94.7% and 98.6%).

Conclusions:

Explainable, conservative hybrid triage with provenance-linked evidence can closely align with expert judgment for Cantonese youth help-seeking text while supporting nurse-in-the-loop review and escalation. The architecture (localized exemplar corpus + structured model outputs + auditable dashboard) is adaptable to other low-resource languages and multilingual settings where safe risk triage requires both linguistic localization and clinical accountability. Clinical Trial: Not applicable (nonrandomized expert validation study; no clinical trial registration required).

Citation

Please cite as:

Kwok HC, Feng X, Lam SK, Chung P, Lee D, Stiglic G, Guan S, Hui V

Development and validation of explainable risk assessment system for LLM-driven emotional support chatbot (Emobot)

JMIR Preprints. 01/04/2026:96839

DOI: 10.2196/preprints.96839

URL: https://preprints.jmir.org/preprint/96839

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Apr 1, 2026

Open Peer Review Period: Apr 1, 2026 - May 27, 2026

(currently open for review)

Development and validation of explainable risk assessment system for LLM-driven emotional support chatbot (Emobot)

ABSTRACT

Citation

Copyright