Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: May 18, 2026
Open Peer Review Period: Jun 2, 2026 - Jul 28, 2026
(currently open for review)

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Natural Language Processing–Based Severity Phenotyping of Metabolic Dysfunction–Associated Steatotic Liver Disease in Health Check-Up Records: Cross-Sectional Study

  • Yong Fang; 
  • Liqun Shi; 
  • Jie Huang; 
  • Chunxing Liu; 
  • Zhiyue Xu

ABSTRACT

Background:

Hepatic steatosis in MASLD-oriented health check-up settings is commonly linked to cardiometabolic risk. Routine ultrasound reports often describe steatosis severity, but this information is usually embedded in free text. Many datasets therefore use a simple yes/no definition, which can miss stage-related differences across metabolic measures.

Objective:

We aimed to (1) extract ultrasound-reported hepatic steatosis severity stages from routine ultrasound narratives using natural language processing (NLP), (2) describe severity-associated metabolic patterns across stages, (3) examine whether these patterns differ by sex, and (4) test whether routine clinical indicators can identify moderate-to-severe ultrasound-reported steatosis.

Methods:

We conducted a cross-sectional analysis of 107,120 health check-up records from Shanghai Health and Medical Center (Oct 2024–Oct 2025). A rule-based NLP pipeline classified ultrasound narratives into five steatosis-severity stages (Normal, Trend, Mild, Moderate, Severe) and was validated against 450 physician-annotated narratives. We summarized metabolic indicators by stage and compared adjacent stages using bootstrap-based nonparametric methods. Analyses were repeated by sex, and women were further stratified by age (<50 vs ≥50 years). We also built a multivariable logistic regression model to identify moderate-to-severe ultrasound-reported steatosis and evaluated it by stratified 10-fold cross-validation using aggregated out-of-fold predictions.

Results:

Metabolic burden increased across NLP-defined stages. Adjacent-stage bootstrap comparisons showed larger Mild-to-Moderate increases for BMI and ALT than the preceding Mild-to-Trend increases. In contrast, FBG, SBP, and TG changed more gradually across stages. UA showed a similar direction but without statistical support for an inflection. Men had higher absolute levels, but stage-associated patterns were broadly similar between sexes and did not suggest a sex-by-severity interaction. In women, age-stratified analyses showed marker-specific severity-by-age heterogeneity; at the Moderate stage, interaction coefficients were mostly negative, indicating that the Moderate-versus-Normal contrast was not larger in women aged ≥50 years than in women aged <50 years. The prediction model for moderate-to-severe ultrasound-reported steatosis showed stable internal performance (AUC 0.898±0.008; AP 0.239; Brier 0.025).

Conclusions:

Severity staging derived from ultrasound narratives can be recovered at scale using NLP and supports severity-graded risk stratification of ultrasound-reported steatosis in MASLD-oriented screening data. This approach moves beyond binary classification and highlights the Mild-to-Moderate transition for selected markers, especially BMI and ALT, using routinely collected measures without specialized imaging or invasive testing.


 Citation

Please cite as:

Fang Y, Shi L, Huang J, Liu C, Xu Z

Natural Language Processing–Based Severity Phenotyping of Metabolic Dysfunction–Associated Steatotic Liver Disease in Health Check-Up Records: Cross-Sectional Study

JMIR Preprints. 18/05/2026:101344

DOI: 10.2196/preprints.101344

URL: https://preprints.jmir.org/preprint/101344

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.