Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 4, 2023
Open Peer Review Period: Aug 4, 2023 - Sep 29, 2023
Date Accepted: Nov 20, 2023
(closed for review but you can still tweet)
How can the clinical aptitude of artificial intelligence assistants be assayed?
ABSTRACT
Large language models are exhibiting remarkable performance in clinical contexts, with exemplar results ranging from expert-level attainment in medical examinations to superior accuracy and relevance when responding to patient queries than real doctors on a social media website. Deployment of large language models in conventional healthcare settings is yet to be reported, and there remains an open question as to what evidence should be required before such deployment is warranted. Early validation studies use unvalidated surrogate variables to represent clinical aptitude, and it may be necessary to conduct prospective randomised-control trials to justify use of a large language model for clinical advice or assistance as potential pitfalls and pain-points cannot be exhaustively predicted. As large language models continue to revolutionise the field, there is an opportunity to improve the rigour of artificial intelligence research to reward innovation resulting in real benefit to real patients.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.