Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Currently submitted to: JMIR Medical Informatics

Date Submitted: Dec 8, 2025

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Metadata Needs Assessment for Data Reuse: Inventory and Concept Mapping Based on a Real-World Case Study

  • Joris E. Lieverse; 
  • Otto R. Maarsingh; 
  • Joanna E. Klopotowska; 
  • Ron M.C. Herings; 
  • Ronald Cornet

ABSTRACT

Background:

Longitudinal observational databases collect real–world data (RWD), registered by healthcare providers, and make these data available to researchers. Metadata, that is, data describing other data, are crucial to facilitate meaningful interpretation of such RWD. The FAIR Principles hold that data and metadata should be richly described with accurate, relevant attributes. Yet, data reuse is impeded by low–quality or absent metadata. Despite existence of metadata frameworks, supporting data annotation, there is little insight into the actual metadata needed to interpret third–party data.

Objective:

The aim of the study was to gain such insight by exploring the metadata needs in a real–world study, reusing RWD from two organizations that collected general practitioner patient data. We started our real–world study with a specific research goal (ie, chronic kidney disease cohort identification) and identified what metadata were needed to reach this goal.

Methods:

The metadata elicitation process involved inventorying all metadata documentation available to the researchers, covering metadata fragments (eg, data dictionaries) and records of interactions (eg, email exchange, meeting minutes). We compiled both the metadata required to understand the data or related inquiries. After deduplication and merging these items, stages of concept mapping were employed to identify categories of metadata by creating cluster maps and to inspect perceived importance.

Results:

A diverse group of 23 participants took part in the concept mapping. We identified 84 metadata items within 9 distinct clusters, including data collection, data processing, data quality, and data modelling. The variety of items and clusters illustrate the challenge of achieving a predefined metadata set. Most items (70/84) were rated on average as moderately important (3) to important (4) on a 5–point Likert scale. Categories concerning features that enable data interpretation were rated as more important (3.638 [SD 0.836] – 3.739 [SD 0.841]) than those focused on technical details (2.876 [SD 0.954] – 3.261 [SD 0.832]). Most items (79/84) and all categories are not domain–specific for the descriptive study. While existing frameworks offer relevant high–level metadata, they do not accommodate the detailed insights uncovered through our practice–based metadata elicitation.

Conclusions:

Our study shows that for a practical use case an extensive set of metadata items is required which is unlikely to be available upfront. However, as most of the required items are generic, they can be specified and made available on demand resulting in an increasingly rich set of metadata. These results guide the further development of more in–depth metadata frameworks and of procedures for incremental specification of metadata.


 Citation

Please cite as:

Lieverse JE, Maarsingh OR, Klopotowska JE, Herings RM, Cornet R

Metadata Needs Assessment for Data Reuse: Inventory and Concept Mapping Based on a Real-World Case Study

JMIR Preprints. 08/12/2025:89134

DOI: 10.2196/preprints.89134

URL: https://preprints.jmir.org/preprint/89134

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.