Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Jun 27, 2025
Date Accepted: Oct 24, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in Answering Thai Drug Information Queries: Cross-Sectional Study

Senngam M, Pornwattanakavee S, Leelakanok N, Todsarot T, Guinto GAT, Takun R, Sumativit A

Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in Answering Thai Drug Information Queries: Cross-Sectional Study

JMIR AI 2025;4:e79751

DOI: 10.2196/79751

PMID: 41397693

PMCID: 12750067

The Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in Answering Thai Drug Information Queries: a Cross-sectional Study

  • Marisa Senngam; 
  • Suphannika Pornwattanakavee; 
  • Nattawut Leelakanok; 
  • Teerarat Todsarot; 
  • Gabrielle Angele Tatta Guinto; 
  • Ratchanon Takun; 
  • Assadawut Sumativit

ABSTRACT

Background:

Artificial intelligence (AI) chatbots, including ChatGPT-4o, Google Gemini, and Microsoft Copilot, are increasingly utilized to deliver healthcare-related information. Their potential to assist in pharmaceutical care and drug information services is gaining attention globally. However, their ability to provide accurate, complete, and safe drug-related information in non-English contexts, particularly in Thai, remains underexplored.

Objective:

This study aimed to evaluate the performance of these AI systems in responding to drug-related questions written in Thai.

Methods:

An analytical cross-sectional study was conducted using 76 public drug-related questions compiled from medical databases and social media sources between November 1st, 2019, and December 31st, 2024. These questions were categorized into 18 distinct types along with one mixed-type category, with each category comprising four questions (n=19 categories × 4 questions=76). The responses generated by ChatGPT-4o, Google Gemini, and Microsoft Copilot were evaluated in terms of correctness, completeness, risk, and reproducibility. All AI models were queried using identical input text in Thai, and responses were independently assessed by clinical pharmacists using standardized evaluation criteria.

Results:

ChatGPT-4o demonstrated a higher proportion of fully correct responses (50.0%) compared to Microsoft Copilot (35.5%) and Google Gemini (34.2%), although these differences did not reach statistical significance (P=.078). All three AI models provided generally complete responses, with no significant difference in completeness scores among them (P=.080). While high-risk answers were observed across all systems, the overall risk levels were not significantly different (P=.123). The category of drug-related questions significantly influenced the correctness of AI responses (P=.002), but not completeness (P=.230). ChatGPT-4o generally yielded the highest proportion of fully correct and complete answers across most categories. However, in the pharmacology category, Google Gemini and Microsoft Copilot outperformed ChatGPT in correctness. Question type also statistically significantly affected the risk level of the answers (P=.039); in particular, the pregnancy and lactation category showed the highest high-risk response rate (1.32% per system). Regarding reproducibility, all three AI models demonstrated consistent response patterns when the same questions were re-queried after 1, 7, and 14 days, with no significant deviation from the initial responses.

Conclusions:

The evaluated AI chatbots were able to answer the queries with generally complete content; however, we found limited accuracy and occasional high-risk errors in responding to drug-related questions in Thai. However, all models exhibited good reproducibility, with consistent response patterns observed across multiple time points. Further improvements are necessary to provide safe, reliable, and language-specific pharmaceutical information.


 Citation

Please cite as:

Senngam M, Pornwattanakavee S, Leelakanok N, Todsarot T, Guinto GAT, Takun R, Sumativit A

Effectiveness of ChatGPT, Google Gemini, and Microsoft Copilot in Answering Thai Drug Information Queries: Cross-Sectional Study

JMIR AI 2025;4:e79751

DOI: 10.2196/79751

PMID: 41397693

PMCID: 12750067

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.