Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: Journal of Medical Internet Research

Date Submitted: Feb 10, 2026

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Applying Anomaly Detection Methods to Google Search Logs to Detect Periods Preceding Suicide Attempts

  • Xinyang Ren; 
  • Amanda H. Kerbrat; 
  • Payton Smythe; 
  • Ethan H. Kim; 
  • Courtney L. Bagge; 
  • Devon Sandel-Fernandez; 
  • Keyne Catherine Law; 
  • Nichole Sams; 
  • Patrick J. Heagerty; 
  • Patricia A. Areán; 
  • Katherine Anne Comtois; 
  • Trevor A. Cohen

ABSTRACT

Background:

Suicide is a major global public health concern and a leading cause of death worldwide. However, timely identification of individuals at elevated risk remains challenging. Traditional suicide risk assessment relies heavily on self-report and structured screening tools, which are often limited by infrequent monitoring, social stigma surrounding mental health issues, and restricted access to mental health services. As individuals increasingly rely on web search engines for everyday information seeking, personal search data offer a unique opportunity to capture signals related to suicide risk.

Objective:

This study aimed to quantitatively evaluate the utility of personal Google search logs for detecting periods preceding suicide attempts using anomaly detection methods.

Methods:

A subset of personal Google search data collected in a retrospective-prospective study, focusing on 80 adult participants who reported suicide attempt dates (n=111) with sufficient detail to enable identification of high-risk periods, was used for analysis. Neural language representation techniques were developed to calculate a proximity score between each search query and constructs representing known suicide warning signs, reflecting search behavior related to suicide risk. The semantic feature construction framework, consists of initial proximity based filtering on query-construct relatedness by a smaller-scale language model and further adjudication by a state-of-the-art large language model (Llama 3.1), was feasible for application to large-scale web search data. Anomaly detection methods were applied to semantic and behavioral features of search activity to identify time periods (14 days) preceding the suicide attempts. Models were trained on search data more than 6 months prior to the attempt and evaluated over the 180 days preceding and including the attempt date. The ROC AUC score was used to evaluate how effectively the model distinguished attempt periods from non-attempt periods.

Results:

The majority of analyzed suicide attempts were effectively distinguished from non-attempt periods. The best-performing anomaly detection model achieved a median ROC AUC score of 0.75 across attempts. Individually trained models outperformed generalized models trained on normative data from all participants. Semantic features that characterized search content’s relatedness to constructs contributed more substantially to the anomaly detection performance than behavioral features. Substantial differences in search feature importance were observed across participants, while consistent feature importance was found within the attempt histories of the same participant. Results from the ablation study demonstrated the utility of large language models in adjudicating the validity of construct-to-search relationships.

Conclusions:

This study showed that personal Google search logs can be a valuable resource for suicide risk detection. The proposed framework has the potential to support the development of automated suicide risk detection tools for more timely identification of individuals at elevated risk.


 Citation

Please cite as:

Ren X, Kerbrat AH, Smythe P, Kim EH, Bagge CL, Sandel-Fernandez D, Law KC, Sams N, Heagerty PJ, Areán PA, Comtois KA, Cohen TA

Applying Anomaly Detection Methods to Google Search Logs to Detect Periods Preceding Suicide Attempts

JMIR Preprints. 10/02/2026:93274

DOI: 10.2196/preprints.93274

URL: https://preprints.jmir.org/preprint/93274

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.