Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Cancer

Date Submitted: Dec 17, 2024
Date Accepted: Jul 7, 2025

The final, peer-reviewed published version of this preprint can be found here:

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

Nishisako S, Higashi T, Wakao F

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

JMIR Cancer 2025;11:e70176

DOI: 10.2196/70176

PMID: 40934488

PMCID: 12425422

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Development of AI Chatbots for Cancer Information: Reducing Hallucinations and Trade-Offs in Responses with Reliable Data

  • Sota Nishisako; 
  • Takahiro Higashi; 
  • Fumihiko Wakao

ABSTRACT

Background:

Generative artificial intelligence (AI) is increasingly used to find information. Providing accurate information is essential to support cancer patients and their families; however, information returned by generative AIs is sometimes wrong. Returning wrong information is called hallucination.

Objective:

We aimed to examine cancer information returned by generative AIs with retrieval-augmented generation (RAG) using cancer-specific information sources and general internet search.

Methods:

We compiled 62 cancer-related questions in Japanese and compared the responses of conventional chatbots with GPT-4 and GPT-3.5 (-turbo-16K) without RAG. We developed generative AI chatbots with different reference information sources—RAG-equipped Cancer Information Service (CIS) chatbot and Google chatbot—and compared the characteristics of their responses with those generated by a conventional chatbot without RAG. The CIS chatbot system included CIS as the reference information source. The characteristics of the responses were analyzed.

Results:

For questions on information issued by CIS, the rates of hallucinations for the CIS chatbot were 0% for GPT-4 and 6% for GPT-3.5, whereas those for the Google chatbot were 6% and 10%. For questions on information that is not issued by CIS, the Google chatbot generated hallucinations in 19% of cases using GPT-4 and 35% using GPT-3.5. The conventional chatbot returned hallucinations in approximately 40% of the responses. The reference data from Google searches was higher compared to CIS for producing hallucinations, with an odds ratio of 9.4, (95% confidence interval 1.2-17.5, P < .01), and the odd ratio for the conventional chatbot was 16.1 (95% CI, 3.7-50.0, P < .001). The conventional chatbot responded to all questions, but the response rate decreased (36% to 81%) for chatbots with RAG. For questions on information not covered by CIS, the CIS chatbot did not respond, while the Google chatbot generated responses in 52% of the cases using GPT-4 and 71% using GPT-3.5.

Conclusions:

Using RAG with reliable information sources significantly reduced the hallucination rate of generative AI chatbots, and increased the ability to admit lack of information, making them more suitable for general use, where users need to be provided with accurate information.


 Citation

Please cite as:

Nishisako S, Higashi T, Wakao F

Reducing Hallucinations and Trade-Offs in Responses in Generative AI Chatbots for Cancer Information: Development and Evaluation Study

JMIR Cancer 2025;11:e70176

DOI: 10.2196/70176

PMID: 40934488

PMCID: 12425422

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.