Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Dec 29, 2025
Date Accepted: May 15, 2026

The final, peer-reviewed published version of this preprint can be found here:

Measuring the Quality of Datasets: Development of the IDEFIM Indicator Set for Empirical Health Research

Harkener S, Bott OJ, Draeger C, Hartz T, Jenetzky E, Löbe M, March S, Schubert C, Stausberg J

Measuring the Quality of Datasets: Development of the IDEFIM Indicator Set for Empirical Health Research

J Med Internet Res 2026;28:e90482

DOI: 10.2196/90482

PMID: 42308504

Measuring the Quality of Datasets: Development of the IDEFIM Indicator Set for Empirical Health Research

  • Sonja Harkener; 
  • Oliver J Bott; 
  • Christian Draeger; 
  • Tobias Hartz; 
  • Ekkehart Jenetzky; 
  • Matthias Löbe; 
  • Stefanie March; 
  • Chris Schubert; 
  • Jürgen Stausberg

ABSTRACT

Background:

To be beneficial for empirical health research, a dataset must be fit for use. The quality of a dataset can only be influenced during data collection, yet is evaluated multiple times during analysis or secondary use by applying quality indicators.

Objective:

To establish a most up-to-date set of indicators measuring the quality of datasets in empirical health research.

Methods:

Three pillars were combined. Firstly, the 51 indicators of a German guideline from 2014 about the management of data quality were revised. Secondly, a literature review was performed looking for evidence sources since 2013 that describe, propose, or apply dataset quality indicators. Thirdly, indicators were supplemented by hand search and other sources. The quality indicators were then integrated into the IDEFIM framework. The IDEFIM framework distinguishes between the categories data, metadata, context, and openness quality. In this work, only the categories data and metadata quality with their 14 dimensions were considered.

Results:

Totally, 69 indicators qualified for the IDEFIM indicator set, 53 related to the category data quality and 16 related to the category metadata quality. Thirty indicators each originated from the German guideline and the literature review. Three indicators were added to cover aspects of diversity, equity and inclusion (DEI), further 6 related to specifics of data and metadata quality not addressed so far. Most indicators were found in the dimensions accuracy (data) with 12 measures, completeness (data) with 12 measures, and consistency (data) with 19 measures. According to the number of supporting evidence sources, missing values in data elements (48 evidence sources), contradictions (31), and currentness (26) were the most important quality indicators. Metadata quality was significantly less frequently addressed.

Conclusions:

The presented IDEFIM indicator set can be used for the management of a data collection as well as for a verification of a dataset’s quality for an intended use. The indicator set should also be considered in the design of a study in empirical health research and the development of software tools supporting the visualization of issues related to the quality of a dataset.


 Citation

Please cite as:

Harkener S, Bott OJ, Draeger C, Hartz T, Jenetzky E, Löbe M, March S, Schubert C, Stausberg J

Measuring the Quality of Datasets: Development of the IDEFIM Indicator Set for Empirical Health Research

J Med Internet Res 2026;28:e90482

DOI: 10.2196/90482

PMID: 42308504

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.