Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Mar 7, 2025
Date Accepted: Aug 4, 2025
Multi-source coherence analysis of the first European multi-centre cohort study for cancer prevention in people experiencing homelessness: a data quality study
ABSTRACT
Background:
Coherence across sites in multi-centre datasets is one significant data quality dimension for reliable health data reuse, as unexpected heterogeneity in data can lead to biases in data analyses and suboptimal generalisation of results.
Objective:
This work aims to characterise and label the data coherence across sites in the first European multi-centre dataset for cancer prevention in People Experiencing Homelessness (PEH), created in the CANCERLESS EU project. This dataset emerged to enable research to address disparities in health challenges and healthcare access due to barriers such as unstable housing, limited resources, and social stigma in PEH.
Methods:
Methods:
The dataset comprises 652 cases: 142 from Austria, 158 from Greece, 197 from Spain, and 155 from the United Kingdom. All participants fit classifications from the European Typology of Homelessness and Housing Exclusion. This longitudinal study collected questionnaires at baseline, after four weeks, and at the end of the intervention. The 180-question survey covered socio-demographic data, overall health, mental health, empowerment, and interpersonal communication. Data variability was assessed using information theory and geometric methods to analyse discrepancies in distributions and completeness across the dataset.
Results:
Results:
Significant variability was observed among the four pilot countries, both in the overall analysis and within specific domains. In particular, measures of Healthcare Empowerment, quality of life, and Interpersonal Communication demonstrated the greatest discrepancies among pilot sites, with the exception of the health domain. Notably, Spain exhibited the most pronounced differences, characterized by a high number of missing values related to Interpersonal Communication and the use of healthcare services.
Conclusions:
Conclusion: Health data may be comparable across the four countries; however, significant differences were observed in the other questionnaires, requiring independent, country-specific analyses. This study underscores the heterogeneity among PEH and the critical need for data quality assessments to inform future research and policymaking in this field.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.