Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 8, 2024
Date Accepted: Apr 11, 2025
Classifying Domestic Violence Survivors’ Information Needs in Online Health Communities Using Large Language Models: Prediction Model Development Study
ABSTRACT
Background:
Domestic violence (DV) is a significant public health concern affecting the physical and mental well-being of numerous women, imposing a substantial healthcare burden. However, women facing DV often encounter barriers to seeking in-person help due to stigma, shame, and embarrassment. As a result, many DV survivors turn to online health communities (OHCs) as a safe and anonymous space to share their experiences and seek support. Understanding the information needs of DV survivors in OHCs is crucial for providing timely and appropriate support through multi-class classification.
Objective:
The objective was to develop a fine-tuned large language model that can provide fast and accurate predictions of the information needs of DV survivors from their online posts, enabling healthcare professionals to offer timely and personalized assistance.
Methods:
We collected 294 posts from Reddit sub-communities focused on DV, shared by women aged 18+ who self-identified as experiencing intimate partner violence. We identified eight types of information needs: Shelters/DV centers/Agencies, Legal, Childbearing, Police, DV report procedure/Documentation, Safety planning, DV knowledge, and Communication. Data augmentation was employed using GPT 3.5 to expand our dataset to 2,216 samples by generating 1,922 additional posts that imitated the existing data. We adopted a progressive training strategy to fine-tune the GPT-3.5 for multi-class text classification with a total of 2,032 posts. We trained the model on one class at a time, monitoring performance closely. When suboptimal results were observed, we generated additional samples of the misclassified ones to give them more attention. We reserved 184 posts for internal testing and 74 for external validation. Model performance was evaluated using accuracy, recall, precision, and F1 score, along with confidence intervals for each metric.
Results:
Using 40 real posts and 144 AI-generated posts as the test dataset, our model achieved an F1 score of 70.49% (95% CI 60.63%-80.35%) for real posts, outperforming the original GPT-3.5, GPT-4, fine-tuned Llama2-7B, Llama3-8B, and LSTM. On AI-generated posts, our model attained 84.58% (95% CI 80.38%-88.78%) F1 score, surpassing all baselines. When tested on an external validation dataset (n=74), the model achieved an F1 score of 56.51% (95% CI: 44.09%-66.18%), outperforming other models. Statistical analysis revealed that our model significantly outperformed the others in F1 score (P < .005 for real posts, P < .001 for external validation posts). Furthermore, our model was significantly faster, taking 19.108 seconds for predictions compared to 1,150 seconds for manual assessment.
Conclusions:
Our fine-tuned large language model is capable of accurately and efficiently extracting and identifying information needs related to DV through multi-class classification from online posts. Additionally, we employed LLM-based data augmentation techniques to overcome the limitations of a relatively small and imbalanced dataset. By generating timely and accurate predictions, we can empower healthcare professionals to provide rapid and suitable assistance to DV survivors.
Citation
Request queued. Please wait while the file is being generated. It may take some time.