Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Oct 9, 2024
Open Peer Review Period: Oct 15, 2024 - Dec 10, 2024
Date Accepted: Feb 23, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Generative Large Language Model—Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19

Roshani MA, Zhou X, Qiang Y, Suresh S, Hicks S, Sethuraman U, Zhu D

Generative Large Language Model—Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19

JMIR AI 2025;4:e67363

DOI: 10.2196/67363

PMID: 40146990

PMCID: 11986386

Generative LLM Powered Conversational AI Application for Personalized Risk Assessment: A Case Study in COVID-19

  • Mohammad Amin Roshani; 
  • Xiangyu Zhou; 
  • Yao Qiang; 
  • Srinivasan Suresh; 
  • Steve Hicks; 
  • Usha Sethuraman; 
  • Dongxiao Zhu

ABSTRACT

Background:

Large Language Models (LLMs) have demonstrated powerful capabilities in natural language tasks and are increasingly being integrated into healthcare for tasks like disease risk assessment. Traditional machine learning methods rely on structured data and coding, limiting their flexibility in dynamic clinical environments. This work presents a novel approach to disease risk assessment using generative LLMs via conversational AI, eliminating the need for programming.

Objective:

This study explores the use of pre-trained generative LLMs, including LLaMA2-7b and Flan-T5-xl, to assess COVID-19 severity in real time. The goal is to compare their performance with traditional classifiers, such as Logistic Regression, XGBoost, and Random Forest, which are trained on structured tabular data.

Methods:

We fine-tuned LLMs using few-shot natural language examples from a dataset of 393 pediatric patients, developing a mobile application that integrates these models to provide real-time, no-code COVID-19 severity risk assessment through clinician-patient interaction. The LLMs were compared with traditional classifiers across different experimental settings, using Area Under the Curve (AUC) as the primary evaluation metric. Feature importance derived from LLM attention layers was also analyzed to enhance interpretability.

Results:

Generative LLMs consistently outperformed traditional machine learning models, particularly in low-data settings. In zero-shot scenarios, the T0-3b model achieved an AUC of 0.75, whereas traditional classifiers like Logistic Regression and XGBoost lagged behind, with AUCs of 0.57 and 0.50, respectively. LLMs maintained their lead even as the number of training examples increased, outperforming traditional models up to 32-shot settings. For instance, the Flan-T5-xl model achieved an AUC of 0.70 in 32-shot experiments, further highlighting the LLMs' effectiveness in few-shot learning scenarios. Moreover, the mobile application provided real-time COVID-19 severity assessments and personalized insights through attention-based feature importance, adding value to the clinical interpretation of the results.

Conclusions:

Generative LLMs provide a robust alternative to traditional classifiers, particularly in scenarios with limited labeled data. Their ability to handle unstructured inputs and deliver personalized, real-time assessments without coding makes them highly adaptable to clinical settings. This study underscores the potential of LLM-powered conversational AI in healthcare and encourages further exploration of its use for real-time disease risk assessment and decision-making support.


 Citation

Please cite as:

Roshani MA, Zhou X, Qiang Y, Suresh S, Hicks S, Sethuraman U, Zhu D

Generative Large Language Model—Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19

JMIR AI 2025;4:e67363

DOI: 10.2196/67363

PMID: 40146990

PMCID: 11986386

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.