Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Das S, Ge Y, Guo Y, Rajwal S, Hairston J, Powell J, Walker D, Peddireddy S, Lakamana S, Bozkurt S, Reyna M, Sameni R, Xiao Y, Kim S, Chandler R, Hernandez N, Mowery D, Wightman R, Love J, Spadaro A, Perrone J, Sarker A
Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study
Two-layer retrieval augmented generation framework for low-resource medical question-answering using Reddit data: Proof of concept
Sudeshna Das;
Yao Ge;
Yuting Guo;
Swati Rajwal;
JaMor Hairston;
Jeanne Powell;
Drew Walker;
Snigdha Peddireddy;
Sahithi Lakamana;
Selen Bozkurt;
Matthew Reyna;
Reza Sameni;
Yunyu Xiao;
Sangmi Kim;
Rasheeta Chandler;
Natalie Hernandez;
Danielle Mowery;
Rachel Wightman;
Jennifer Love;
Anthony Spadaro;
Jeanmarie Perrone;
Abeed Sarker
ABSTRACT
Background:
The increasing use of social media to share lived and living experiences of substance use presents a unique opportunity to obtain information on side-effects, usage patterns, and opinions on novel psychoactive substances (NPS). However, due to the large volume of data, obtaining useful insights through natural language processing (NLP) technologies such as large language models (LLMs) is challenging.
Objective:
To develop a retrieval-augmented generation (RAG) architecture for medical question answering pertaining to clinicians’ queries on emerging issues associated with health-related topics using user-generated medical information on social media.
Methods:
We proposed a two-layer RAG framework for query-focused answer generation and evaluated a proof-of-concept for the framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. We compared the performance of a quantized large language model (LLM), deployable in low-resource settings, with GPT-4.
Results:
Our framework achieves comparable median scores in terms of relevance, length, hallucination, coverage, and coherence when evaluated using GPT-4 and Nous-Hermes-2-7B-DPO, evaluated over 20 queries with 52 samples.
Conclusions:
Retrieval augmented generation using LLMs is useful for medical question answering in resource-constrained settings.
Citation
Please cite as:
Das S, Ge Y, Guo Y, Rajwal S, Hairston J, Powell J, Walker D, Peddireddy S, Lakamana S, Bozkurt S, Reyna M, Sameni R, Xiao Y, Kim S, Chandler R, Hernandez N, Mowery D, Wightman R, Love J, Spadaro A, Perrone J, Sarker A
Two-Layer Retrieval-Augmented Generation Framework for Low-Resource Medical Question Answering Using Reddit Data: Proof-of-Concept Study