Accepted for/Published in: JMIR Formative Research
Date Submitted: Jul 19, 2024
Date Accepted: May 5, 2025
Early Diagnosis of Knee Osteoarthritis Using Clinician Notes: A Natural Language Processing-Driven Approach
ABSTRACT
Background:
Knee osteoarthritis (knee OA) is a common form of knee arthritis, which can cause significant disability and is threatening to a patient's quality of life. According to WHO, it was reported that 528 million people suffered from knee OA in 2019, which was an 113% increase since 1990. Although this condition is chronic and irreversible, it can be improved–or even prevent the disease to worsen–if diagnosed and received treatment properly early on. Therefore, a prediction of knee OA is considered one of the essential steps to effectively diagnose and prevent further severe osteoarthritis conditions. Knee OA is commonly diagnosed by medical experts/physicians, in which the diagnosis of OA is mostly based upon patients’ lab results, medical images, including X-ray and MRI images. The limitation of diagnosis through such data is often time-consuming. Also, the diagnosis results can be varied from physician to physician depending on the expertise of the medical personnel who actually conducted the diagnosis. Having said that, the previous studies are mostly focused on automatically detecting knee OA through such data (lab results and medical images) using AI, for example. However, when it comes to doctors’ notes, which is textual data, these studies did not incorporate this textual data into the analysis, even though this textual data of reported symptoms/behavior information are already available and easier to collect and access than lab results and medical images.
Objective:
We propose a novel approach in diagnosing the knee OA disease by utilizing natural language processing (NLP) on doctors’ notes on patients’ reported symptoms in a textual format–without any medical images or statistical data (lab results).
Methods:
The textual information from the doctors’ notes is first pre-processed using text analysis algorithms with respect to natural language processing. We then incorporate deep learning models, including Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU). Lastly, a disease-specific standard questionnaire, called the WOMAC (The Western Ontario and McMaster Universities Arthritis Index), is also taken into account to improve the overall performance of the models.
Results:
The highest performance from the benchmark before applying our WOMAC-based Processing Approach is as follows: ACC=88.69%, AUC=0.8773, Sen=90.73%, Spe=85.52%, and F1-Score=90.73%. After applying our WOMAC-based Processing Approach, the performance increased as follows: ACC=90.54%, AUC=0.8978, Sen=93.84%, Spe=85.71%, and F1-Score=92.17%.
Conclusions:
The proposed method predicted the occurrence of knee osteoarthritis and our method yields significantly better prediction performance than other conventional methods that use images and statistical lab data. The major finding of this study is that it is possible to use texts of symptoms reported by patients to predict knee osteoarthritis conditions. Furthermore, this study demonstrates that the text of symptoms reports is one of the valuable source data to predict whether a particular knee will have osteoarthritis progression or not.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.