JMIR Preprints #64705: Validation of a Natural Language Processing Pipeline to Generate an Orthopedic Research Outcome Database: Total Hip Arthroplasty in a High-Volume, Single-Surgeon Practice

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Validation of a Natural Language Processing Pipeline to Generate an Orthopedic Research Outcome Database: Total Hip Arthroplasty in a High-Volume, Single-Surgeon Practice

Nicholas H Mast;
Clara Lillian Oeste;
Dries Hens

ABSTRACT

Background:

Processing data from electronic health records (EHRs) to build research-grade databases is a lengthy and expensive process. Modern arthroplasty practice commonly employs multiple sites of care, including clinics and ambulatory care centers. However, most private data systems prevent obtaining usable insights for clinical practice.

Objective:

This study aims to create an automated natural language processing (NLP) pipeline for extracting clinical concepts from EHRs related to orthopedic outpatient visits, hospitalizations, and surgeries in a multi-center, single-surgeon practice. The pipeline was also used to assess therapies and complications after total hip arthroplasty (THA).

Methods:

EHRs of 1290 patients undergoing primary total hip arthroplasty (THA) from January 1st 2012 to December 31st 2019 (operated and followed by the same surgeon) were processed using artificial intelligence (AI)-based models (NLP and machine learning, ML). Three independent medical reviewers generated a gold standard using 100 randomly selected EHRs. The algorithm processed the entire database from different EHR systems, generating an aggregated clinical data warehouse. An additional manual control arm was used for data quality control.

Results:

The algorithm was as accurate as human reviewers (0.95 vs 0.94, p<0.05), achieving a database-wide average F1-score of 0.92 (SD: 0.09; min-max: 0.67-0.99), validating its use as an automated data extraction tool. During the first year after direct anterior THA, 92.11% of our population had a complication-free recovery. In 7.98% of cases where surgery or recovery was not uneventful, lateral femoral cutaneous nerve (LFCN) sensitivity (3.64%; n=47), intraoperative fractures (1.01%; n=13), and hematoma (0.70%; n=9) were the most common complications.

Conclusions:

Algorithm evaluation of this dataset accurately represented key clinical information swiftly, compared to human reviewers. This technology may provide substantial value for future surgeon practice and patient counseling. Furthermore, the low early complication rate of direct anterior THA in this surgeon's hands was supported by the dataset, which included data from all treated patients in a multi-center practice.

Citation

Please cite as:

Mast NH, Oeste CL, Hens D

Assessing Total Hip Arthroplasty Outcomes and Generating an Orthopedic Research Outcome Database via a Natural Language Processing Pipeline: Development and Validation Study

JMIR Med Inform 2025;13:e64705

DOI: 10.2196/64705

PMID: 40073425

PMCID: 11922490

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jul 24, 2024

Date Accepted: Feb 16, 2025

Validation of a Natural Language Processing Pipeline to Generate an Orthopedic Research Outcome Database: Total Hip Arthroplasty in a High-Volume, Single-Surgeon Practice

ABSTRACT

Citation

Copyright