Currently submitted to: Journal of Medical Internet Research
Date Submitted: Feb 26, 2026
Open Peer Review Period: Feb 27, 2026 - Apr 24, 2026
(closed for review but you can still tweet)
NOTE: This is an unreviewed Preprint
Warning: This is a unreviewed preprint (What is a preprint?). Readers are warned that the document has not been peer-reviewed by expert/patient reviewers or an academic editor, may contain misleading claims, and is likely to undergo changes before final publication, if accepted, or may have been rejected/withdrawn (a note "no longer under consideration" will appear above).
Peer review me: Readers with interest and expertise are encouraged to sign up as peer-reviewer, if the paper is within an open peer-review period (in this case, a "Peer Review Me" button to sign up as reviewer is displayed above). All preprints currently open for review are listed here. Outside of the formal open peer-review period we encourage you to tweet about the preprint.
Citation: Please cite this preprint only for review purposes or for grant applications and CVs (if you are the author).
Final version: If our system detects a final peer-reviewed "version of record" (VoR) published in any journal, a link to that VoR will appear below. Readers are then encourage to cite the VoR instead of this preprint.
Settings: If you are the author, you can login and change the preprint display settings, but the preprint URL/DOI is supposed to be stable and citable, so it should not be removed once posted.
Submit: To post your own preprint, simply submit to any JMIR journal, and choose the appropriate settings to expose your submitted version as preprint.
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Artificial Intelligence for Predicting Patient Reported Outcome Measures (PROMs) Scores from Free Text: A Proof-of-Concept Study with the EuroQol-5D-3L and Transformer Models
ABSTRACT
Background:
Patient-reported outcomes measures (PROMs) have become an important tool in measuring a patient’s health status from their own perspective; however, they are typically measured using standardized questionnaires which do not account for each patient's unique experience of health. Recent improvements in Natural Language Processing (NLP) provide new possibilities to extract PROM scores from unstructured or free-text patient narratives; however, the feasibility and minimal data requirements needed to accomplish this task remain uncertain.
Objective:
To assess the practicality of transformer-based models for predicting EuroQol EQ-5D-3L scores from patient narratives and to evaluate minimum data requirements, narrative length and data augmentation effects.
Methods:
This proof-of-concept study used synthetically generated patient narratives to evaluate methodological feasibility. Three transformer models (BERT, BioBERT, DistilBERT) were fine-tuned for regression from patient narratives representing all 243 EQ-5D-3L health states. The performance of the models in various scenarios including a range of sample sizes (n=100–850), narrative length (100–1000 words), and data augmentation conditions were compared. The performance of the models was assessed through fivefold cross-validation and additional validation on datasets created by ChatGPT and DeepSeek.
Results:
Each model was able to predict EQ-5D-3L scores using each of the different configurations of data (n=100-850 patients; 100-1000-word narratives). However, optimal results were obtained when training the models with 100-word narratives derived from the largest number of people (n=850), where mean squared error=0.03 (95% CI: 0.02-0.04), mean absolute error=0.13 (95% CI: 0.13-0.15), explained variance=0.77 (95% CI: 0.64-0.77), and intraclass correlation coefficient=0.85 (95% CI: 0.81-0.87). Furthermore, it was found that the shorter narratives (100 words) performed better than longer narratives (100-1000 words). Additionally, the use of data augmentation improved the predictive performance.
Conclusions:
Transformer models show promise in predicting EQ-5D-3L PROM scores from synthetic patient generated narratives, with a minimum of 250 patients providing around 100-word narratives required for reliable performance. The work provides both a methodological basis and empirical standards for AI-based PROM systems. However, clinical implementation will require validation using real patient-authored narratives prior to adoption. If validated, the use of this approach could provide evidence to support the inclusion of a patient's experience as a narrative into standardized outcome measures and support patient-centred healthcare evaluations.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.