Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Sep 8, 2024
Date Accepted: Apr 11, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Predicting the Information Needs for Domestic Violence Survivors Based on the Fine-tuned Large Language Model
ABSTRACT
Background:
Domestic violence (DV) is a significant public health concern affecting the physical and mental well-being of numerous women, as well as imposing a substantial health care burden, making it imperative to address. However, women facing DV often encounter barriers to seeking in-person help due to stigma, shame, and embarrassment. As a result, many DV survivors turn to online health communities (OHCs) as a safe and anonymous space to share their experiences and seek support. Understanding the information needs of DV survivors in OHCs is crucial for providing timely and appropriate support.
Objective:
The objective was to develop a fine-tuned large language model that can provide fast and accurate predictions of the information needs of DV survivors from their online posts, enabling healthcare professionals to offer timely and personalized assistance.
Methods:
We collected 294 posts from Reddit sub-communities focused on DV, shared by women aged 18+ who self-identified as experiencing intimate partner violence (IPV) and sought advice. Based on the post content, we identified eight types of information needs: Shelters/DV centers/Agencies, Legal, Childbearing, Police, DV report procedure/Documentation, Safety planning, DV knowledge, and Communication. Data augmentation was employed using GPT 3.5 to expand our dataset to 2,216 samples, by prompting the GPT-3.5 to imitate the existing data in each class to generate additional posts. We adopted a progressive training strategy to fine-tune the GPT-3.5, LLaMA2-7B, and LLaMA3-8B for this multi-class text classification. This involved training each model on one class at a time and sequentially moving to the next to closely monitor the model's performance. If suboptimal performance was observed between the two classes, we applied data augmentation techniques to generate more samples that were similar to those misclassified samples. Thus, the misclassified samples will be given more attention, whereas that of correctly classified samples will be decreased.
Results:
Using 13.6% of real posts and 7.5% of AI-generated posts as the test dataset, our fine-tuned GPT-3.5 model achieved an overall F1 score of 70.49% in predicting the real posts, outperforming the original GPT-3.5 (37.96%), GPT-4 (46.54%), fine-tuned Llama2-7B (48.60%), and Llama3-8B (30.93%). On AI-generated posts, our model attained 84.58% F1 score, surpassing original GPT-3.5 (73.33%), GPT-4 (74.02%), and fine-tuned LLaMA3-8B (80.19%), matching the performance of LLaMA2-7B. Furthermore, our model was significantly faster, taking 19.108 seconds for predictions compared to 1,150 seconds for manual assessment.
Conclusions:
Our fine-tuned large language model is capable of accurately and efficiently extracting and identifying information needs related to domestic violence from online posts. Additionally, we employed LLM-based data augmentation techniques to overcome the limitations of a relatively small and imbalanced dataset. By generating timely and accurate predictions, we can empower healthcare professionals to provide rapid and suitable assistance to DV survivors.
Citation