The diagnostic performance of large language models and oral medicine consultants for identifying oral lesions in text-based clinical scenarios: a prospective comparative study
ABSTRACT
Background:
The use of artificial intelligence (AI), especially large language models (LLMs), is increasing in healthcare, including in dentistry. There has yet to be an assessment of the diagnostic performance of LLMs in oral medicine.
Objective:
To compare the effectiveness of the Generative Pre-trained Transformer (ChatGPT) and Microsoft Copilot (integrated within the Microsoft 365 suite) with oral medicine consultants in formulating accurate differential and final diagnoses for oral lesions from written clinical scenarios.
Methods:
Fifty comprehensive clinical case scenarios including patient age, presenting complaint, history of the presenting complaint, medical history, allergies, intra- and extra-oral findings, lesion description, and any additional information including laboratory investigations and specific clinical features were given to three oral medicine consultants, who were asked to formulate a differential diagnosis and a final diagnosis. Specific prompts for the same fifty cases were designed and input into ChatGPT and Copilot to formulate both differential and final diagnoses. Diagnostic accuracy was compared between the LLMs and oral medicine consultants.
Results:
ChatGPT exhibited the highest accuracy, providing the correct differential diagnoses in 37 (74.0%) cases. There were no significant differences in accuracy of providing the correct differential diagnoses between AI models and oral medicine consultants. ChatGPT was as accurate as consultants in making final diagnoses, but Copilot was significantly less accurate than ChatGPT (p=0.015) and one of the OM consultants (p<0.001) in providing the correct final diagnosis.
Conclusions:
ChatGPT and Copilot show promising performance for diagnosing oral medicine pathology in clinical case scenarios to assist dental practitioners. ChatGPT-4 and Copilot are still evolving, but even now might provide a significant advantage in the clinical setting as tools to help dental practitioners in their daily practice.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.