Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR AI

Date Submitted: Dec 26, 2024
Date Accepted: Mar 18, 2025

The final, peer-reviewed published version of this preprint can be found here:

The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study

AlFarabi Ali S, AlDehlawi H, Jazzar A, Ashi H, Abuzinadah N, AlOtaibi M, Algarni A, Alqahtani H, Akeel S, AlMazrooa S

The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study

JMIR AI 2025;4:e70566

DOI: 10.2196/70566

PMID: 40605790

PMCID: 12223689

The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: a prospective comparative study

  • Sarah AlFarabi Ali; 
  • Heba AlDehlawi; 
  • Ahoud Jazzar; 
  • Heba Ashi; 
  • Nihal Abuzinadah; 
  • Mohammad AlOtaibi; 
  • Abdulrahman Algarni; 
  • Hazzaa Alqahtani; 
  • Sara Akeel; 
  • Soulafa AlMazrooa

ABSTRACT

Background:

The use of artificial intelligence (AI), especially large language models (LLMs), is increasing in healthcare, including in dentistry. There has yet to be an assessment of the diagnostic performance of LLMs in oral medicine.

Objective:

To compare the effectiveness of the Generative Pre-trained Transformer (ChatGPT) and Microsoft Copilot (integrated within the Microsoft 365 suite) with oral medicine consultants in formulating accurate differential and final diagnoses for oral lesions from written clinical scenarios.

Methods:

Fifty comprehensive clinical case scenarios including patient age, presenting complaint, history of the presenting complaint, medical history, allergies, intra- and extra-oral findings, lesion description, and any additional information including laboratory investigations and specific clinical features were given to three oral medicine consultants, who were asked to formulate a differential diagnosis and a final diagnosis. Specific prompts for the same fifty cases were designed and input into ChatGPT and Copilot to formulate both differential and final diagnoses. Diagnostic accuracy was compared between the LLMs and oral medicine consultants.

Results:

ChatGPT exhibited the highest accuracy, providing the correct differential diagnoses in 37 (74.0%) cases. There were no significant differences in accuracy of providing the correct differential diagnoses between AI models and oral medicine consultants. ChatGPT was as accurate as consultants in making final diagnoses, but Copilot was significantly less accurate than ChatGPT (p=0.015) and one of the OM consultants (p<0.001) in providing the correct final diagnosis.

Conclusions:

ChatGPT and Copilot show promising performance for diagnosing oral medicine pathology in clinical case scenarios to assist dental practitioners. ChatGPT-4 and Copilot are still evolving, but even now might provide a significant advantage in the clinical setting as tools to help dental practitioners in their daily practice.


 Citation

Please cite as:

AlFarabi Ali S, AlDehlawi H, Jazzar A, Ashi H, Abuzinadah N, AlOtaibi M, Algarni A, Alqahtani H, Akeel S, AlMazrooa S

The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study

JMIR AI 2025;4:e70566

DOI: 10.2196/70566

PMID: 40605790

PMCID: 12223689

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.