Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 17, 2024
Open Peer Review Period: Dec 17, 2024 - Feb 11, 2025
Date Accepted: Apr 17, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models

Zhang J, Zhou J, Zhou L, Ba Z

Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models

JMIR Med Inform 2025;13:e70096

DOI: 10.2196/70096

PMID: 40373298

PMCID: 12123238

Extracting multifaceted characteristics of patients with chronic disease comorbidity: a framework based on Large Language Models

  • Junyan Zhang; 
  • Junchen Zhou; 
  • Liqin Zhou; 
  • Zhichao Ba

ABSTRACT

Background:

Chronic multi-morbidity research has increasingly become a focal point with the aging of the population. Many related studies require the use of patient characteristic information. However, the current methods for extracting patient characteristics are complex, time-consuming, and prone to errors. The challenge of quickly and accurateOur objective is to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models to extract feature information from unstructured and semi-structured electronic health records (EHRs) pertaining to these patients. We investigated the model’s proficiency in extracting feature information across seven dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. Additionally, we demonstrated the strengths and limitations of this framework.ly extracting patient characteristic information has become a common issue in the study of chronic disease comorbidities.

Objective:

Our objective is to establish a comprehensive framework for extracting demographic and disease characteristics of patients with multimorbidity. This framework leverages large language models to extract feature information from unstructured and semi-structured electronic health records (EHRs) pertaining to these patients. We investigated the model’s proficiency in extracting feature information across seven dimensions: basic information, disease details, lifestyle habits, family medical history, symptom history, medication recommendations, and dietary advice. Additionally, we demonstrated the strengths and limitations of this framework.

Methods:

The research utilizes data sourced from grassroots community health service centers in China. We developed a multifaceted feature extraction framework tailored for patients with multimorbidity, which consists of several integral components: feasibility testing, preprocessing, determination of feature extraction, prompt modeling based on large language models, postprocessing, and mid-term evaluation. Within this framework, seven types of feature information were extracted as straightforward features: height, weight, gender, date of birth, lifestyle habits, medical history, and family medical history, three types of features were identified as intricate features: symptom history, medication recommendations, and dietary advice. Based on these straightforward features, we calculated patients’ age, BMI, and 12 disease risk factors. Rigorous manual verification experiments were conducted 100 times for straightforward features and 200 times for intricate features, followed by comprehensive quantitative and qualitative assessments of the experimental outcomes.

Results:

The framework achieved an overall F1 score of 99.57% for seven straightforward feature extractions, with the highest F1 score of 100% for basic information and the lowest score of 99% for lifestyle habits. Additionally, the framework demonstrated an overall F1 score of 94.36% for three intricate feature extractions, with the highest F1 score of 98.96% for drug name extraction and the lowest F1 score of 84.98% for drug dosage extraction. Our analysis of the results reveals that accurate information content extraction is a significant advantage of this framework, whereas ensuring consistency in the format of extracted information remains one of its challenges.

Conclusions:

The framework incorporates electronic health record information from 1220 patients with multimorbidity, covering a diverse range of 41 chronic diseases, and can seamlessly accommodate the inclusion of additional diseases. This underscores its scalability and adaptability as a method for extracting patient-specific characteristics, effectively addressing the challenges associated with information retrieval in the context of multi-disease research.


 Citation

Please cite as:

Zhang J, Zhou J, Zhou L, Ba Z

Extracting Multifaceted Characteristics of Patients With Chronic Disease Comorbidity: Framework Development Using Large Language Models

JMIR Med Inform 2025;13:e70096

DOI: 10.2196/70096

PMID: 40373298

PMCID: 12123238

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.