Accepted for/Published in: JMIR Cancer
Date Submitted: Jan 31, 2025
Open Peer Review Period: Jan 31, 2025 - Mar 28, 2025
Date Accepted: Sep 16, 2025
(closed for review but you can still tweet)
Cancer Diagnosis Categorization in Electronic Health Records Using Large Language Models and BioBERT: Model Performance Evaluation Study
ABSTRACT
Background:
Electronic Health Records (EHRs) contain inconsistently structured or free-text data, requiring efficient preprocessing to enable predictive healthcare models. While artificial intelligence-driven natural language processing tools show promise for automating diagnosis classification, their comparative performance and clinical reliability require systematic evaluation.
Objective:
To evaluate the performance of four Large Language Models (GPT-3.5, GPT-4o, Llama 3.2, and Gemini 1.5) and BioBERT in classifying cancer diagnoses from structured and unstructured electronic health records data.
Methods:
We analyzed 762 unique diagnoses (326 ICD code descriptions, 436 free text entries) from 3,456 cancer patients' records. Models were tested on their ability to categorize diagnoses into 14 predefined categories. Two oncology experts validated classifications.
Results:
BioBERT achieved the highest accuracy (90.7%) and weighted accuracy (94.6%) for ICD codes, but its performance dropped to 81.6% accuracy for free text. GPT-4o matched BioBERT’s ICD code accuracy and slightly outperformed it in free text (81.8% accuracy), while GPT-3.5, Gemini, and Llama showed lower overall performance. Common misclassification patterns included difficulty distinguishing metastatic cancers and interpreting ambiguous clinical terminology.
Conclusions:
While current accuracy levels are sufficient for administrative tasks, success in clinical applications depends on standardized documentation combined with appropriate human oversight for critical decisions.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.