Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 4, 2024
Date Accepted: Apr 1, 2025

The final, peer-reviewed published version of this preprint can be found here:

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

Bönisch C, Schmidt C, Kesztyüs D, Kestler HA, Kesztyüs T

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

JMIR Med Inform 2025;13:e60204

DOI: 10.2196/60204

PMID: 40587839

PMCID: 12234397

A Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

  • Caroline Bönisch; 
  • Christian Schmidt; 
  • Dorothea Kesztyüs; 
  • Hans A. Kestler; 
  • Tibor Kesztyüs

ABSTRACT

Background:

Evidence-based medicine combines scientific research, clinical expertise, and patient preferences to enhance patient outcomes and improve healthcare quality. Clinical data is crucial in aligning medical decisions with evidence-based practices, whether derived from systematic research or real-world data sources. Quality assurance of clinical data, mainly through predictive quality algorithms and machine learning, is essential to mitigate risks such as misdiagnosis, inappropriate treatment, bias, and compromised patient safety.

Objective:

This study aims to demonstrate the varying quality of medical data in clinical primary source systems and provide researchers with insights into data reliability through predictive quality algorithms utilizing machine learning techniques.

Methods:

A literature review was conducted to evaluate existing approaches to automated quality prediction. Additionally, metadata relevant to clinical data was stored in a relational database, taking into account factors such as data granularity and quality metrics. A predictive quality algorithm was developed using machine learning, focusing on preprocessing the dataset, training machine learning algorithms on echocardiographic, laboratory, and medication data, and assessing various prediction models to identify the most effective algorithms for quality classification.

Results:

Classifiers were used to predict the quality of medical data, and the performance of the algorithms was assessed based on accuracy, precision, recall, and scoring. Extreme Gradient Boosting (XGB) demonstrated the highest performance with an accuracy of 84.7%, AUC ROC of 84.6%, F1-score of 84.0%, and precision of 83.9%.

Conclusions:

This proposal presents a template for predicting data quality and incorporating the resulting quality information into the metadata of a data integration center, a concept not previously implemented. The model was deployed for data inspection using a hybrid approach that combines the trained model with conventional inspection methods.


 Citation

Please cite as:

Bönisch C, Schmidt C, Kesztyüs D, Kestler HA, Kesztyüs T

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

JMIR Med Inform 2025;13:e60204

DOI: 10.2196/60204

PMID: 40587839

PMCID: 12234397

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.