Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Sep 2, 2025
Open Peer Review Period: Sep 3, 2025 - Oct 29, 2025
Date Accepted: Jan 29, 2026
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Enhancing Detection of Message Intents in a Mobile Health Smoking-Cessation Intervention Using Large Language Model Fine-Tuning, Data Downsampling, and Error Correction: Algorithm Development and Validation

Rahman S, Pechmann C(, Harris IG

Enhancing Detection of Message Intents in a Mobile Health Smoking-Cessation Intervention Using Large Language Model Fine-Tuning, Data Downsampling, and Error Correction: Algorithm Development and Validation

J Med Internet Res 2026;28:e83437

DOI: 10.2196/83437

Enhancing Detection of Message Intents in a Mobile Health Smoking-Cessation Intervention Using Large Language Model Fine-Tuning, Data Downsampling and Error Correction: Algorithm Development and Validation

  • Shagoto Rahman; 
  • Cornelia (Connie) Pechmann; 
  • Ian G Harris

ABSTRACT

Background:

Although smoking cessation aids including support groups and nicotine replacement therapy (NRT) can help people quit smoking, quit rates remain low. Emerging mobile health interventions like online support groups can help overcome barriers related to aid accessibility and convenience. Combined with NRT, online support groups hold significant potential but demand continuous, labor-intensive efforts to deliver timely responses that maintain participant engagement. Accurate user intent detection, which is the process of understanding the purpose behind a user’s message, can play a critical role by identifying individual needs and consequently providing timely and proper responses. Recent large language model advancements in natural language processing and artificial intelligence (AI) have shown promise. However, these systems often struggle when faced with a large number of intent categories or the complex nature of human language. Uneven data across intent categories—some rare and others dominant—makes it harder for the system to correctly recognize user intent and respond.

Objective:

The main goal of this study was to develop an AI tool, especially a large language model that could accurately recognize users’ message intents, despite imbalances and complexities in data. In our application, users’ message intents were related to a smoking-cessation support-group intervention and utilization of the free nicotine replacement therapy (NRT) provided as part of that intervention.

Methods:

We consistently used a state-of-the-art public domain large language model, Llama-3 8B from Meta. First, we used the model off-the-shelf. Second, we fine-tuned it on our annotated domain dataset of 25 intent categories. Third, we downsampled the predominant intent category to reduce bias and fine-tuned the model. Finally, we combined downsampling with corrected annotations to create a cleaned dataset for another round of fine-tuning. This stepwise approach progressively improved classification accuracy by addressing prior limitations.

Results:

Without fine-tuning, the large language model achieved an unweighted-average F1-score of 0.29 and a weighted-average F1-score of 0.37, where unweighted treated all categories equally, while weighted emphasized larger ones. Fine-tuning alone achieved unweighted and weighted-average F1-scores of 0.72 and 0.86 respectively. Downsampling plus fine-tuning achieved unweighted and weighted-average F1-scores of 0.80 and 0.85 respectively. Downsampling, fine-tuning and human error correction achieved an unweighted-average F1-score of 86% and a weighted-average F1-score of 90%.

Conclusions:

On smoking cessation data, large language models performed poorly without fine-tuning, underscoring the need for domain-specific training. However, with domain-specific fine-tuning, performance has suffered because of the highly imbalanced dataset. Downsampling the majority category before fine-tuning improved results moderately, however left room for further enhancement and raised concerns about potential noise in the dataset. Carefully reviewing the misclassified samples has helped identify annotation inconsistencies, and after correcting these errors and fine-tuning the model on the corrected dataset, best performance has been achieved.


 Citation

Please cite as:

Rahman S, Pechmann C(, Harris IG

Enhancing Detection of Message Intents in a Mobile Health Smoking-Cessation Intervention Using Large Language Model Fine-Tuning, Data Downsampling, and Error Correction: Algorithm Development and Validation

J Med Internet Res 2026;28:e83437

DOI: 10.2196/83437

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.