JMIR Preprints #60204: A Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

A Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

Caroline Bönisch;
Christian Schmidt;
Dorothea Kesztyüs;
Hans A. Kestler;
Tibor Kesztyüs

ABSTRACT

Background:

Evidence-based medicine combines scientific research, clinical expertise, and patient preferences to enhance patient outcomes and improve healthcare quality. Clinical data is crucial in aligning medical decisions with evidence-based practices, whether derived from systematic research or real-world data sources. Quality assurance of clinical data, mainly through predictive quality algorithms and machine learning, is essential to mitigate risks such as misdiagnosis, inappropriate treatment, bias, and compromised patient safety.

Objective:

This study aims to demonstrate the varying quality of medical data in clinical primary source systems and provide researchers with insights into data reliability through predictive quality algorithms utilizing machine learning techniques.

Methods:

A literature review was conducted to evaluate existing approaches to automated quality prediction. Additionally, metadata relevant to clinical data was stored in a relational database, taking into account factors such as data granularity and quality metrics. A predictive quality algorithm was developed using machine learning, focusing on preprocessing the dataset, training machine learning algorithms on echocardiographic, laboratory, and medication data, and assessing various prediction models to identify the most effective algorithms for quality classification.

Results:

Classifiers were used to predict the quality of medical data, and the performance of the algorithms was assessed based on accuracy, precision, recall, and scoring. Extreme Gradient Boosting (XGB) demonstrated the highest performance with an accuracy of 84.7%, AUC ROC of 84.6%, F1-score of 84.0%, and precision of 83.9%.

Conclusions:

This proposal presents a template for predicting data quality and incorporating the resulting quality information into the metadata of a data integration center, a concept not previously implemented. The model was deployed for data inspection using a hybrid approach that combines the trained model with conventional inspection methods.

Citation

Please cite as:

Bönisch C, Schmidt C, Kesztyüs D, Kestler HA, Kesztyüs T

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

JMIR Med Inform 2025;13:e60204

DOI: 10.2196/60204

PMID: 40587839

PMCID: 12234397

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 4, 2024

Date Accepted: Apr 1, 2025

A Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

ABSTRACT

Citation

Copyright