Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: May 9, 2022
Open Peer Review Period: May 3, 2022 - May 31, 2022
Date Accepted: Jul 26, 2022
Date Submitted to PubMed: Aug 2, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave

Cook L, Espinoza J, Weiskopf NG, Mathews N, Dorr DA, Gonzales KL, Wilcox A, Madlock-Brown C, on behalf of the N3C Consortium

Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave

JMIR Med Inform 2022;10(9):e39235

DOI: 10.2196/39235

PMID: 35917481

PMCID: 9490543

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Variability in EHR Data About Race and Ethnicity As Observed in the National COVID Cohort Collaborative Data Enclave

  • Lily Cook; 
  • Juan Espinoza; 
  • Nicole G. Weiskopf; 
  • Nisha Mathews; 
  • David A. Dorr; 
  • Kelly L. Gonzales; 
  • Adam Wilcox; 
  • Charisse Madlock-Brown; 
  • on behalf of the N3C Consortium

ABSTRACT

Background:

A significant technical challenge related to integrating race and ethnicity data across EHR systems is the lack of consistency in how data about race and ethnicity is collected and structured by healthcare organizations.

Objective:

To evaluate and describe variations in how healthcare systems collect and report information about the race and ethnicity of their patients, and how these data are integrated when it is aggregated into a large clinical database.

Methods:

At the time of our analysis, the National COVID Cohort Collaborative (N3C) Data Enclave contained records from 6.5 million patients contributed by 56 healthcare institutions. We assessed the quality of race and ethnicity data by analyzing its conformance to federal standards, then drilled into the non-conforming data.

Results:

“No matching category” was the second largest harmonized racial group in the N3C. 20.7% of the race data did not conform to the federal standard; the largest category was data that were missing. Hispanic or Latino patients were over-represented in the non-conforming racial data, and data from American Indian or Alaska Native patients were obscured. Although only a small proportion of the source data had not been mapped to the correct concepts (0.6%), Black or African-American and Hispanic/Latino patients were over-represented in this category.

Conclusions:

The impact of data quality issues was not equal across all races and ethnicities, which has the potential to introduce bias in analyses and conclusions drawn from these data.The adverse impact of COVID-19 on marginalized and under-resourced communities of color has highlighted the need for accurate, comprehensive race and ethnicity data. Differences in how race and ethnicity data is conceptualized and encoded by healthcare institutions can affect the quality of the data in aggregated clinical databases. Transparency about how data has been transformed can help users make accurate analyses and inferences, and eventually better guide clinical care and public policy.


 Citation

Please cite as:

Cook L, Espinoza J, Weiskopf NG, Mathews N, Dorr DA, Gonzales KL, Wilcox A, Madlock-Brown C, on behalf of the N3C Consortium

Issues With Variability in Electronic Health Record Data About Race and Ethnicity: Descriptive Analysis of the National COVID Cohort Collaborative Data Enclave

JMIR Med Inform 2022;10(9):e39235

DOI: 10.2196/39235

PMID: 35917481

PMCID: 9490543

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.