Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 2, 2024
Date Accepted: Oct 19, 2024
Exploring the Potential of Claude 3 Opus in Renal Pathological Diagnosis: A Performance Evaluation
ABSTRACT
Background:
Artificial intelligence (AI) has shown great promise in assisting medical diagnosis, but its application in renal pathology remains limited.
Objective:
Evaluated the performance of an advanced AI language model, Claude 3 Opus, in generating diagnostic descriptions for renal pathological images.
Methods:
A dataset of 100 renal pathological images across 27 disease types was curated. Claude 3 Opus generated diagnostic descriptions for each image, which were scored by two pathologists on clinical relevance, accuracy, fluency, completeness, and overall value.
Results:
Claude 3 Opus achieved high scores in language fluency (mean=3.86) but lower scores in clinical relevance (1.75), accuracy (1.55), completeness (2.01), and overall value (1.75). Performance varied across disease types. Inter-rater agreement was substantial for relevance (κ=0.627) and overall value (κ=0.589), and moderate for accuracy (κ=0.485) and completeness (κ=0.458).
Conclusions:
Claude 3 Opus shows potential in generating fluent renal pathology descriptions but needs improvement in accuracy and clinical value. AI's performance varies across disease types. Further optimization and validation are needed for clinical application.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.