Currently submitted to: JMIR Medical Informatics
Date Submitted: May 15, 2026
Open Peer Review Period: May 21, 2026 - Jul 16, 2026
(currently open for review)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Estimating Longitudinal Functional Independence Measure Scores From Japanese Rehabilitation Notes Using a Large Language Model: Retrospective Study
ABSTRACT
Background:
The Functional Independence Measure (FIM) is a standard functional assessment tool in rehabilitation medicine; however, in routine clinical practice, formal assessments are often conducted only at limited intervals, making it difficult to capture patients’ functional recovery trajectories during hospitalization at a high temporal resolution. In contrast, free-text clinical notes routinely documented by physical therapists, occupational therapists, and speech-language pathologists contain rich observational information on patients’ functional status, but this information has not been systematically utilized for quantitative assessment.
Objective:
The aim of this study was to estimate FIM scores at multiple time points from Japanese free-text rehabilitation notes using a large language model (LLM), to evaluate estimation performance by disease and therapy type, and to analyze biases in the distribution of functional information inherent in clinical notes.
Methods:
We retrospectively analyzed free-text notes written by physical therapists, occupational therapists, and speech-language pathologists for patients hospitalized with cerebral infarction, hip fracture, or vertebral fracture at Saiseikai Moriyama Municipal Hospital between 2019 and 2024. Using zero-shot in-context learning, an LLM was employed to estimate scores for all 18 FIM items at each time point. The estimated scores were aligned with ground-truth FIM assessments recorded on the same day and evaluated using mean absolute error (MAE), root mean squared error (RMSE), and weighted kappa. In addition, stratified analyses based on admission FIM scores and therapy-specific FIM item coverage analyses were conducted.
Results:
Overall, the FIM scores estimated by the LLM demonstrated the feasibility of estimating FIM scores from rehabilitation notes. Estimations based on occupational therapy notes showed consistently lower MAE and higher weighted kappa across all disease groups. In contrast, speech-language pathology notes contained limited information on motor items, resulting in relatively larger estimation errors. Patients with lower admission FIM scores tended to exhibit larger estimation errors, suggesting that differences in observational context across therapy records influenced estimation performance.
Conclusions:
Overall, the FIM scores estimated by the LLM demonstrated the feasibility of estimating FIM scores from rehabilitation notes. Estimations based on occupational therapy notes showed consistently lower MAE and higher weighted kappa across all disease groups. In contrast, speech-language pathology notes contained limited information on motor items, resulting in relatively larger estimation errors. Patients with lower admission FIM scores tended to exhibit larger estimation errors, suggesting that differences in observational context across therapy records influenced estimation performance. This study demonstrates that FIM scores can be estimated from free-text rehabilitation notes using the LLM. The proposed approach has the potential to complement sparsely observed FIM assessments without increasing bedside evaluation burden, enabling higher-frequency monitoring of functional recovery trajectories. Time-series functional assessments estimated by LLMs may further support early prediction of discharge outcomes and the development of personalized rehabilitation planning.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.