JMIR Preprints #91367: Development and Validation of the Automated Safety Testing and Reporting Application (ASTRA): A Tool for Conversational Safety Monitoring of Generative AI Tools for Mental Health

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Development and Validation of the Automated Safety Testing and Reporting Application (ASTRA): A Tool for Conversational Safety Monitoring of Generative AI Tools for Mental Health

Daniel Szoke;
Ilana Hutzler;
Jerry Liu;
Samantha Addante;
Zuhaib Akhtar;
Dale L. Smith;
Kirsten Dickens;
Charles Small;
Sarah Pridgen;
Philip Held

ABSTRACT

Background:

AI-based conversational tools are rapidly expanding within mental health care as a means of increasing access and scalability. At the same time, these systems introduce distinct safety risks arising from both user disclosures (e.g., self-harm ideation) and inappropriate or inadequate AI responses.

Objective:

The objective of this study was to develop and evaluate the Automated Safety Testing and Reporting Application (ASTRA), an external system intended to identify clinically relevant risk-behaviors across entire AI-mediated mental health conversations.

Methods:

ASTRA was tested on a dataset of 100 synthetic therapeutic conversations written by licensed clinicians to reflect risk-behaviors and harmful responses between users and AI tools. Conversations varied in length and included both subtle and overt risk-behavior examples across eight predefined categories. Human coder consensus ratings served as the reference standard. ASTRA’s classifications were evaluated across two prompt iterations using standard diagnostic performance metrics and agreement statistics.

Results:

ASTRA demonstrated consistently high concordance with expert human ratings across all categories. Accuracy exceeded 0.90 for all risk-behavior categories examined, with specificity uniformly high and sensitivity varying by category (range 0.55-1.00). Agreement beyond chance was substantial to almost perfect between ASTRA and human raters (κ = 0.65 - 1.00). McNemar’s tests indicated no evidence of systematic bias toward false positives or false negatives. Detection of user self-harm indicators was particularly accurate, even in conversations where risk was expressed subtly.

Conclusions:

In this initial validation study, ASTRA reliably identified multiple forms of mental health-related risk-behaviors at the conversation level. These findings support the feasibility of independent safety-monitoring systems as a complement to AI tools used in mental health contexts and underscore the need for further evaluation using larger and real-world datasets.

Citation

Please cite as:

Szoke D, Hutzler I, Liu J, Addante S, Akhtar Z, Smith DL, Dickens K, Small C, Pridgen S, Held P

Automated Safety Testing and Reporting Application for Conversational Safety Monitoring of Generative AI Tools for Mental Health: Development and Validation Study

JMIR Ment Health 2026;13:e91367

DOI: 10.2196/91367

PMID: 42155999

PMCID: 13231103

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Mental Health

Date Submitted: Jan 13, 2026

Date Accepted: Apr 20, 2026

Development and Validation of the Automated Safety Testing and Reporting Application (ASTRA): A Tool for Conversational Safety Monitoring of Generative AI Tools for Mental Health

ABSTRACT

Citation

Copyright