Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Dec 17, 2024
Date Accepted: Jul 7, 2025

The final, peer-reviewed published version of this preprint can be found here:

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

Nishisako S, Higashi T, Wakao F

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

JMIR Cancer 2025;11:e70176

DOI: 10.2196/70176

PMID: 40934488

PMCID: 12425422

Reducing Hallucinations and Trade-Offs in Responses with Reliable Data: Development of AI Chatbots for Cancer Information

  • Sota Nishisako; 
  • Takahiro Higashi; 
  • Fumihiko Wakao

ABSTRACT

Background:

Generative artificial intelligence (AI) is increasingly used to find information. Providing accurate information is essential to support patients with cancer and their families; however, information returned by generative AIs is sometimes wrong. Returning wrong information is called hallucination. Retrieval-augmented generation (RAG), which supplements large language model (LLM) outputs with relevant external sources, has the potential to reduce hallucinations. Although RAG has been proposed as a promising technique, its real-world performance in public health communication remains underexplored.

Objective:

We examined cancer information returned by generative AIs with RAG using cancer-specific information sources and general internet searches to determine whether using RAG with reliable information sources reduces the hallucination rates of generative AI chatbots.

Methods:

We developed six types of chatbots by combining three patterns of reference information with two versions of LLMs. Thus GPT-4 and GPT-3.5 chatbots that use Cancer Information Service (CIS) information, Google information, and no reference information (conventional chatbots) were developed. A total of 62 cancer-related questions in Japanese were compiled from public sources. All responses were generated automatically and independently reviewed by two experienced clinicians. The reviewers assessed the presence of hallucinations, defined as medically harmful or misinformation. We compared hallucination rates across chatbot types and calculated odds ratios using generalized linear mixed-effects models. Subgroup analyses were also performed based on whether questions were covered by CIS content.

Results:

For the chatbots that used information from CIS, the hallucination rates were 0% for GPT-4 and 6% for GPT-3.5, whereas those for chatbots that used information from Google were 6% and 10% for GPT-4 and GPT-3.5, respectively. For questions on information that is not issued by CIS, the hallucination rates for Google-based chatbots were 19% for GPT-4 and 35% for GPT-3.5. The hallucination rates for conventional chatbots were approximately 40%. Using reference data from Google searches generated more hallucinations than using CIS data, with an odds ratio of 9.4, (95% confidence interval 1.2-17.5, P < .01); the odd ratio for the conventional chatbot was 16.1 (95% CI, 3.7-50.0, P < .001). While conventional chatbots always generated a response, the RAG-based chatbots sometimes declined to answer when information was lacking. The conventional chatbots responded to all questions, but the response rate decreased (36% to 81%) for RAG-based chatbots. For questions on information not covered by CIS, the CIS chatbots did not respond, while the Google chatbots generated responses in 52% of the cases for GPT-4 and 71% for GPT-3.5.

Conclusions:

Using RAG with reliable information sources significantly reduces the hallucination rate of generative AI chatbots, and increases the ability to admit lack of information, making them more suitable for general use, where users need to be provided with accurate information.


 Citation

Please cite as:

Nishisako S, Higashi T, Wakao F

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

JMIR Cancer 2025;11:e70176

DOI: 10.2196/70176

PMID: 40934488

PMCID: 12425422

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.