Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 7, 2024
Date Accepted: Feb 13, 2025
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Opportunities for Automated Liver Disease Risk Prediction in the Finnish Healthcare Environment
ABSTRACT
Background:
Chronic liver disease incidence and mortality have been rising worldwide. In many cases, liver disease is detected late in the symptomatic stage, while the earlier detection would be crucial for early initiation of preventative actions. “The Chronic Liver Disease score”, CLivD, risk detection model has been developed with Finnish healthcare data and it predicts a person's risk of getting the disease in future years.
Objective:
We had two main objectives: 1) to evaluate feasibility to implement automatic CLivD score with current Kanta platform, 2) to identify and suggest the improvements for Kanta that would enable accurate automatic risk detection.
Methods:
In this study, real-world data repository (Kanta) was used as a data source for “The ClivD score” risk calculation model. Our dataset consisted of 96 200 individuals whole medical history from Kanta. For real-world data utilization we designed process to handle missing input in calculation process.
Results:
We found that Kanta currently lacks many CLivD risk model input parameters in the structured format required to calculate precise risk scores. However, the risk scores can be improved by utilizing the unstructured text in patient reports and by approximating variables by utilizing other health data like diagnosis information. With only utilizing structured data we were able to identify only 33 persons out of 51 275 persons to “Low risk” category and under 1% to “moderate risk” category. By adding the diagnosis information approximation and free text utilization we were able to identify 37% of persons to “Low risk” category and 4% to “moderate risk” category. In both cases we were not able to identify any persons to “high-risk” category because of the missing waist-hip ratio measurement. We evaluated three scenarios to improve the coverage of waist-hip ratio data in Kanta and these yielded the most substantial improvement in prediction accuracy.
Conclusions:
We conclude that the current structured Kanta data is not enough for precise risk calculation for CLivD or other diseases where obesity, smoking and alcohol use are important risk factors. Our simulations show up to 14% improvement in risk detection when additional data sources are considered. Kanta shows potential for implementing nation-wide automated risk detection models that could result in improved disease prevention and public health. Clinical Trial: This study didn’t have any trial registration. All data utilized in this study were retrieved through Finnish authorities Findata and Kela with all required permissions.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.