Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jun 2, 2025
Date Accepted: Oct 1, 2025

The final, peer-reviewed published version of this preprint can be found here:

Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation

Liu M, Okuhara T, Shirabe R, Nishiie Y, Xu Y, Okada H, Kiuchi T

Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation

JMIR AI 2025;4:e78436

DOI: 10.2196/78436

PMID: 41160724

PMCID: 12571200

Can Artificial Intelligence Search Engine Provide High-quality Responses on Dietary Supplements in Japan? A Quantitative and Qualitative Evaluation

  • Mingxin Liu; 
  • Tsuyoshi Okuhara; 
  • Ritsuko Shirabe; 
  • Yuriko Nishiie; 
  • Yinghan Xu; 
  • Hiroko Okada; 
  • Takahiro Kiuchi

ABSTRACT

Background:

The widespread adoption of AI-powered search engines has transformed how people access health information. Microsoft Copilot, formerly Bing Chat, offers real-time web-sourced responses to user queries, raising concerns about the reliability of its health content. This is particularly critical in the domain of dietary supplements, where scientific consensus is limited and online misinformation is prevalent. Despite the popularity of supplements in Japan, little is known about the accuracy of AI-generated advice on their effectiveness for common diseases.

Objective:

This study aimed to evaluate the reliability and accuracy of Microsoft Copilot (formerly Bing Chat), an AI search engine, in responding to health-related queries about dietary supplements. Our findings can help consumers use LLMs more safely and wisely when seeking information on dietary supplements, and support developers in improving LLMs’ performance in this field.

Methods:

We simulated typical consumer behavior by posing 180 questions (6 per supplement × 30 supplements) to Copilot's three response modes (Creative, Balanced, Precise) in Japanese. These questions addressed the effectiveness of supplements in treating six common conditions. The AI’s answers were classified as “Effective,” “Uncertain,” or “Ineffective,” and evaluated for accuracy against evidence-based assessments conducted by licensed physicians. We also conducted a qualitative content analysis on response text and reviewed source types of all citations provided.

Results:

The proportion of Copilot responses claiming supplement effectiveness was 29.4% (Creative), 47.8% (Balanced), and 45.0% (Precise), while overall accuracy was low across all modes: 36.1% (Creative), 31.7% (Balanced), and 31.7% (Precise). Notably, 72.7% of citations came from unverified sources such as blogs, sales websites, and social media. In 10% of cases, Copilot hallucinated claims not supported by the cited references. Only 48.5% of responses included a recommendation to consult healthcare professionals. Among disease categories, the highest accuracy was found for cancer-related questions, likely due to lower misinformation prevalence.

Conclusions:

This is the first study to assess Copilot’s performance on dietary supplement information. Despite its authoritative appearance, Copilot frequently cites non-credible sources and provides ambiguous or inaccurate information. Its tendency to avoid definitive stances and align with perceived user expectations poses potential risks for health misinformation. These findings highlight the need for integrating health communication principles—such as transparency, audience empowerment, and informed choice—into the development and regulation of AI search engines to ensure safe public use.


 Citation

Please cite as:

Liu M, Okuhara T, Shirabe R, Nishiie Y, Xu Y, Okada H, Kiuchi T

Evaluating the Reliability and Accuracy of an AI-Powered Search Engine in Providing Responses on Dietary Supplements: Quantitative and Qualitative Evaluation

JMIR AI 2025;4:e78436

DOI: 10.2196/78436

PMID: 41160724

PMCID: 12571200

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.