Currently submitted to: JMIR AI
Date Submitted: Mar 10, 2026
Open Peer Review Period: Mar 12, 2026 - Apr 10, 2026
(closed for review but you can still tweet)
NOTE: This is an unreviewed Preprint
Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).
Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.
Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).
Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.
Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.
Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Automated Fidelity Monitoring of Lay-Delivered Mental Health Interventions Using Large Language Models: Development and Pilot Validation of shamiriAI
ABSTRACT
Background:
Task-shifting—the delivery of evidence-based mental health interventions by trained lay providers—has shown promise in closing the treatment gap in low- and middle-income countries. But the effectiveness of task-shifted interventions depends critically on ongoing supervision and monitoring, yet traditional supervision models are difficult to scale. Artificial intelligence (AI) tools capable of automatically processing session recordings and generating structured fidelity feedback for supervisors could offer a scalable alternative, but no such system has been developed or validated for lay-delivered interventions in multilingual, low-resource settings.
Objective:
We developed and pilot-validated shamiriAI, an automated fidelity monitoring tool for lay-delivered mental health interventions, embedded within the Shamiri school-based mental health program in Kenya.
Methods:
We conducted a pilot validation study across six secondary schools in Ngong Hub, Kajiado County, Kenya (May–September 2025). shamiriAI follows a five-stage pipeline: audio ingestion and preprocessing, multilingual automatic speech recognition (ASR) with prosodic feature extraction, personal identifiable information (PII) scrubbing, large language model (LLM)-based fidelity inference, and structured feedback report delivery to supervisors. We pursued two pilot aims: (1) ASR performance on a held-out test set of manually transcribed sessions; and (2) interrater reliability between shamiriAI-generated fidelity ratings and independent human supervisor ratings across 52 recorded sessions, spanning six domains (Required Contents, Specifics, Thoroughness, Clarity, Skill, Purity) rated on a 1–7 scale. Reliability was assessed using intraclass correlation coefficients (ICC), Bland-Altman analysis, adjacent agreement rates, paired-sample t-tests with Holm–Bonferroni correction, and Gwet's AC2.
Results:
The ASR model achieved a Character Error Rate of 0.19, Word Error Rate of 0.34, and cosine semantic similarity of 0.77, indicating strong meaning preservation despite surface-level transcription errors in code-switched speech. On fidelity ratings, AI scores were systematically lower than the human composite overall (M = 5.14, SD = 0.77 vs. M = 5.93, SD = 0.57; Δ = −0.79, 95% CI [−1.04, −0.53], d = −1.16, p < .001). Reliability varied markedly by dimension: ICCs ranged from −0.06 to 0.20 across all six domains. Three distinct patterns emerged: large systematic underrating on holistic interpretive dimensions (Required Contents d = −3.48; Clarity d = −1.56); a bidirectional medium-effect pattern on facilitation dimensions (Thoroughness d = −0.99; Skill d = +0.87); and no significant bias on structured detection dimensions (Specifics 78.8% adjacent agreement; Purity 73.1% adjacent agreement), where performance approached the human–human AC2 benchmark of 0.42–0.60.
Conclusions:
shamiriAI demonstrates technically feasible multilingual ASR and a coherent, interpretable reliability profile. Underperformance was concentrated on holistic inferential dimensions — particularly Required Contents and Clarity — while structured detection tasks already approach operational utility. The underperformance pattern reflects diagnosable misalignments in rubric interpretation and prompt design, with clear engineering solutions identified. These findings provide the foundational validation evidence and dimension-specific diagnostics needed to guide the development of AI-augmented supervision for lay-delivered adolescent mental health programs in sub-Saharan Africa and other multilingual settings.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.