JMIR Preprints #70566: The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: a prospective comparative study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: a prospective comparative study

Sarah AlFarabi Ali;
Heba AlDehlawi;
Ahoud Jazzar;
Heba Ashi;
Nihal Abuzinadah;
Mohammad AlOtaibi;
Abdulrahman Algarni;
Hazzaa Alqahtani;
Sara Akeel;
Soulafa AlMazrooa

ABSTRACT

Background:

The use of artificial intelligence (AI), especially large language models (LLMs), is increasing in healthcare, including in dentistry. There has yet to be an assessment of the diagnostic performance of LLMs in oral medicine.

Objective:

To compare the effectiveness of the Generative Pre-trained Transformer (ChatGPT) and Microsoft Copilot (integrated within the Microsoft 365 suite) with oral medicine consultants in formulating accurate differential and final diagnoses for oral lesions from written clinical scenarios.

Methods:

Fifty comprehensive clinical case scenarios including patient age, presenting complaint, history of the presenting complaint, medical history, allergies, intra- and extra-oral findings, lesion description, and any additional information including laboratory investigations and specific clinical features were given to three oral medicine consultants, who were asked to formulate a differential diagnosis and a final diagnosis. Specific prompts for the same fifty cases were designed and input into ChatGPT and Copilot to formulate both differential and final diagnoses. Diagnostic accuracy was compared between the LLMs and oral medicine consultants.

Results:

ChatGPT exhibited the highest accuracy, providing the correct differential diagnoses in 37 (74.0%) cases. There were no significant differences in accuracy of providing the correct differential diagnoses between AI models and oral medicine consultants. ChatGPT was as accurate as consultants in making final diagnoses, but Copilot was significantly less accurate than ChatGPT (p=0.015) and one of the OM consultants (p<0.001) in providing the correct final diagnosis.

Conclusions:

ChatGPT and Copilot show promising performance for diagnosing oral medicine pathology in clinical case scenarios to assist dental practitioners. ChatGPT-4 and Copilot are still evolving, but even now might provide a significant advantage in the clinical setting as tools to help dental practitioners in their daily practice.

Citation

Please cite as:

AlFarabi Ali S, AlDehlawi H, Jazzar A, Ashi H, Abuzinadah N, AlOtaibi M, Algarni A, Alqahtani H, Akeel S, AlMazrooa S

The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study

JMIR AI 2025;4:e70566

DOI: 10.2196/70566

PMID: 40605790

PMCID: 12223689

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR AI

Date Submitted: Dec 26, 2024

Date Accepted: Mar 18, 2025

The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: a prospective comparative study

ABSTRACT

Citation

Copyright