Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 4, 2025
Date Accepted: Apr 13, 2026
Natural Language Processing for Automated Classification of Cleft and Craniofacial Procedures from Operative Notes: Feasibility Study
ABSTRACT
Background:
Accurate classification of operative notes by procedure type is essential for defining study cohorts in surgical outcomes research. However, operative notes are unstructured, making manual review labor-intensive and unsustainable. There is a critical need for scalable, automated solutions.
Objective:
To develop and evaluate a machine learning framework for automated classification of pediatric craniofacial operative notes.
Methods:
This single-institution, retrospective observational study utilized operative notes from pediatric patients undergoing cleft and craniofacial procedures at a single academic medical center from 2016 to 2024. A classification framework was developed using natural language processing techniques to categorize procedures at three levels: primary procedure type (cleft lip repair, alveolar bone grafting, cleft palate repair, velopharyngeal insufficiency correction, oronasal fistula repair, orthognathic repositioning, and rhinoplasty), procedural subtype (primary vs. revision), and specific surgical technique.
Results:
The dataset comprised 630 operative notes from 311 pediatric patients undergoing cleft and craniofacial procedures between 2016 and 2024, with a mean age of 3.75 years (range: 0-18.75 years). The primary classification model achieved strong performance in distinguishing the seven procedure types with an AUC of 0.92, micro-averaged F1 score of 0.76, macro-averaged F1 score of 0.75, and Hamming loss of 0.058. Secondary classifiers demonstrated excellent performance for procedural subtypes, with AUC scores of 0.85 for cleft lip revision classification and 0.97 for alveolar bone grafting revision classification. Tertiary classifiers for surgical technique identification maintained good discriminative ability with AUC scores ranging from 0.77 to 0.86 across different procedure types.
Conclusions:
Machine learning approaches can effectively automate the classification of pediatric craniofacial operative notes with high accuracy across multiple levels of procedural detail. Implementation of such systems could significantly reduce administrative burden and facilitate large-scale quality improvement initiatives in pediatric craniofacial surgery.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.