Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Infodemiology

Date Submitted: May 9, 2025
Date Accepted: Jan 14, 2026
Date Submitted to PubMed: Jan 16, 2026

The final, peer-reviewed published version of this preprint can be found here:

Health Data for Linguistic Minority Group Research in Canada: Proof-of-Concept Centralized Health Care Metadata Repository Development and Usability Study

Martin-Schreiber V, Peixoto C, Batista R, Belanger C, Tanuseputro P, Hsu AT, Bjerre LM

Health Data for Linguistic Minority Group Research in Canada: Proof-of-Concept Centralized Health Care Metadata Repository Development and Usability Study

JMIR Infodemiology 2026;6:e77242

DOI: 10.2196/77242

PMID: 41543876

PMCID: 12930145

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Health data for linguistic minority research – Developing a proof-of-concept centralized healthcare metadata repository for Canada

  • Vincent Martin-Schreiber; 
  • Cayden Peixoto; 
  • Ricardo Batista; 
  • Christopher Belanger; 
  • Peter Tanuseputro; 
  • Amy T Hsu; 
  • Lise M Bjerre

ABSTRACT

Background:

Language barriers between Canadian patients and providers are associated with poorer health outcomes, including decreased patient safety and quality of care, misdiagnosis and longer treatment initiation times, and increased mortality. However, research exploring language as a social determinant of health is limited, as Canada’s health data is scattered across many jurisdictions, each with its own policies and procedures, making it difficult for researchers to identify, locate, and use existing data. This paper presents the results of a pilot study that attempts to address this gap by creating a metadata repository (MDR) to act as a central source of information about which data is available at which data holdings across Canada.

Objective:

This project had several objectives: 1) To create a proof-of-concept metadata repository for Canadian health data at the variable level. 2) To identify and label language-related variables existing within our MDR of Canadian health data. 3) To develop an interactive public-facing web application to let users browse and search the MDR.

Methods:

Metadata were collected from five Canadian health data sources, including four provincial data holdings and one national survey, and pooled to create a data repository. We then performed bottom-up labelling of language-related variables within the pooled metadata by first employing a search string algorithm across all variable labels, names and definitions, and then consensus screening of these variables using a derived, standardized definition of language/linguistic variables. Using the Shiny web framework in R, we then developed an openly accessible web application (healthdatadictionary.ca) to allow users to search the proof-of-concept MDR.

Results:

A total of n=850,343 variables were collected and included in the repository, with most coming from the ICES (n=717,032; 83.7%) and MCHP (n=97,051; 11.4%) provincial data holdings. Among all variables in the repository, n=219,198 (25.8%) were confirmed to be language-related variables.

Conclusions:

Developing a national metadata repository would be a transformative opportunity for Canadian researchers to leverage the full scope of Canadian health administrative data. While a top-down approach with consistent engagement of and collaboration between provincial data holdings and federal data agencies is ideal to develop a national MDR, the present study demonstrates the feasibility of a bottom-up approach in contributing to this overarching goal.


 Citation

Please cite as:

Martin-Schreiber V, Peixoto C, Batista R, Belanger C, Tanuseputro P, Hsu AT, Bjerre LM

Health Data for Linguistic Minority Group Research in Canada: Proof-of-Concept Centralized Health Care Metadata Repository Development and Usability Study

JMIR Infodemiology 2026;6:e77242

DOI: 10.2196/77242

PMID: 41543876

PMCID: 12930145

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.