JMIR Preprints #26398: Current and Next Visit Prediction for Fatty Liver Disease with a Large-Scale Dataset

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Current and Next Visit Prediction for Fatty Liver Disease with a Large-Scale Dataset

ChengTse Wu;
Ta-Wei Chu;
Jyh-Shing Roger Jang

ABSTRACT

Background:

Fatty liver disease (FLD) arises from the accumulation of fat in the liver and may cause liver inflammation which, according to past research it is shown that if not actively well-controlled, may develop into liver fibrosis, cirrhosis, or even hepatocellular carcinoma in the future.

Objective:

We describe the construction of machine-learning models for current-visit prediction (CVP) which can help physicians obtain more information for accurate diagnosis, and next-visit prediction (NVP) which can help physicians deal provide potential high-risk patients with advice to effectively prevent or delay health deterioration.

Methods:

The large-scale and high-dimensional dataset used in this study comes from the MJ Health Research Foundation in Taipei. The models we created use sequence forward selection (SFS) and one-pass ranking (OPR) for feature selection. For current-visit prediction (CVP), we explored multiple models including Adaboost, support vector machine (SVM), logistic regression (LR), random forest (RF), Gaussian Naïve Bayes (GNB), decision trees C4.5 (C4.5), and classification & regression trees (CART). For next-visit prediction (NVP), we used long short-term memory (LSTM) as a sequence classifier that uses various input sets for prediction. Model performance is evaluated based on two criteria: the accuracy of the test set, and the IoU and coverage between the features selected by OPR/SFS and by domain experts.

Results:

The dataset respectively includes 34,856 and 31,394 unique visits by male and female patients during 2009∼2016. The test accuracy results of CVP for Adaboost, SVM, LR, RF, GNB, C4.5, and CART were respectively 84.28, 83.84, 82.22, 82.21, 76.03, 75.78, and 75.53%. The test accuracy results of NVP of LSTM with fixed and variable intervals were respectively 78.20% and 76.79%. The proposed two paradigms of LSTM respectively achieved 39.29% and 41.21% error reduction when compared with a baseline model of simple induction.

Conclusions:

This study explores a large fatty liver disease (FLD) dataset with high dimensionality. We have developed prediction models that can use for CVP and NVP for FLD prediction. We have also implemented efficient feature selection schemes for CVP and NVP to compare the automatically selected features with expert-selected features.

Citation

Please cite as:

Wu C, Chu TW, Jang JSR

Current-Visit and Next-Visit Prediction for Fatty Liver Disease With a Large-Scale Dataset: Model Development and Performance Comparison

JMIR Med Inform 2021;9(8):e26398

DOI: 10.2196/26398

PMID: 34387552

PMCID: 8391752

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 10, 2020

Date Accepted: Jun 3, 2021

Date Submitted to PubMed: Aug 13, 2021

Current and Next Visit Prediction for Fatty Liver Disease with a Large-Scale Dataset

ABSTRACT

Citation

Copyright