Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 11, 2025
Open Peer Review Period: Jan 11, 2025 - Mar 8, 2025
Date Accepted: May 15, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study

Balch JA, Desarju SS, Nolan VJ, Vallanki D, Buchanan TR, Brinkley LM, Penev Y, Bilgi A, Patel A, Chatham CE, Vanderbilt DM, Uddin R, Bihorac A, Efron P, Loftus TJ, Rahman P, Shickel B

Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study

JMIR Med Inform 2025;13:e71176

DOI: 10.2196/71176

PMID: 40632815

PMCID: 12266303

Language Models for Multi-label Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study

  • Jeremy A Balch; 
  • Sasank S Desarju; 
  • Victoria J Nolan; 
  • Divya Vallanki; 
  • Timothy R Buchanan; 
  • Lindsey M Brinkley; 
  • Yordan Penev; 
  • Ahmet Bilgi; 
  • Aashay Patel; 
  • Corinne E Chatham; 
  • David M Vanderbilt; 
  • Rayon Uddin; 
  • Azra Bihorac; 
  • Philip Efron; 
  • Tyler J Loftus; 
  • Protiva Rahman; 
  • Benjamin Shickel

ABSTRACT

Background:

Operative notes are mined for surgical concepts in patient care, research, performance improvement, and billing workflows, an endeavor which may be conceived as a multi-label document classification task.

Objective:

We developed and evaluated large language models (LLMs) for the purpose of expediting data extraction from surgical notes.

Methods:

388 exploratory laparotomy notes from a single institution were annotated for 21 concepts related to intraoperative findings, intraoperative techniques, and closure techniques. Annotation consistency was measured using the Cohen’s kappa statistic. We contrast conventional natural language processing (NLP) approaches––bag-of-words (BoW) and term frequency-inverse document frequency (tf-idf) with linear classifiers––with encoder-only (Clinical-Longformer, CL) and decoder-only (Llama 3.1 70b) LLMs. Multi-label classification performance was evaluated with 5-fold cross-validation with F1 score and hamming loss (HL). LLM prompting strategies were modified based on error analysis.

Results:

Prevalence of labels ranged from 0.05 (colostomy, ileostomy, active bleed from named vessel) to 0.50 (running fascial closure). Llama 3.1 70B was the overall best-performing model (micro-F1 0.86 [5-fold range: 0.85, 0.87], HL 0.14 [0.13, 0.15]). The BoW model (micro-F1 0.68 [0.64, 0.71], HL 0.14 [0.13, 0.16]) and Clinical-Longformer (micro-F1 0.73 [0.70, 0.74], HL 0.11 [0.10, 0.12]) had overall similar performance, with tf-idf models trailing (micro-F1 0.57 [0.55, 0.59], HL 0.27 [0.25, 0.29]). F1 scores varied across concepts in the Llama model, ranging from 0.21 [0.11, 0.30] for partial skin closure to 0.92 [0.88, 0.96] for bowel resection. Error analysis demonstrated semantic nuances and edge cases within operative notes.

Conclusions:

Off-the-shelf autoregressive LLMs outperformed fined-tuned, encoder-only transformers and traditional NLP techniques in classifying operative notes. Clinical Trial: n/a


 Citation

Please cite as:

Balch JA, Desarju SS, Nolan VJ, Vallanki D, Buchanan TR, Brinkley LM, Penev Y, Bilgi A, Patel A, Chatham CE, Vanderbilt DM, Uddin R, Bihorac A, Efron P, Loftus TJ, Rahman P, Shickel B

Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study

JMIR Med Inform 2025;13:e71176

DOI: 10.2196/71176

PMID: 40632815

PMCID: 12266303

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.