Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Oct 15, 2024
Open Peer Review Period: Oct 15, 2024 - Dec 10, 2024
Date Accepted: Jan 12, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis

Acitores Cortina JM, Fatapour Y, Brown KL, Gisladottir U, Zietz M, Bear Don't Walk OJ IV, Peter D, Berkowitz JS, Friedrich NA, Kivelson S, Kuchi A, Liu H, Srinivasan A, Tsang KK, Tatonetti NP

Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis

JMIR Med Inform 2025;13:e67591

DOI: 10.2196/67591

PMID: 40146917

PMCID: 11967746

Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for 'Complete Data'

  • Jose Miguel Acitores Cortina; 
  • Yasaman Fatapour; 
  • Kathleen Larow Brown; 
  • Undina Gisladottir; 
  • Michael Zietz; 
  • Oliver John Bear Don't Walk IV; 
  • Danner Peter; 
  • Jacob S. Berkowitz; 
  • Nadine A. Friedrich; 
  • Sophia Kivelson; 
  • Aditi Kuchi; 
  • Hongyu Liu; 
  • Apoorva Srinivasan; 
  • Kevin K Tsang; 
  • Nicholas P. Tatonetti

ABSTRACT

Background:

-

Objective:

Integrated clinical databases from national biobanks have advanced the capacity for disease research. Data quality and completeness filters are used when building clinical cohorts to address limitations of data missingness. However, these filters may unintentionally introduce systemic biases when they are correlated with race and ethnicity. In this study, we examined the race/ethnicity biases introduced by applying common filters to four clinical records databases.

Methods:

We used 19 filters commonly used in electronic health records research on the availability of demographics, medication records, visit details, observation periods, and other data types. We evaluated the effect of applying these filters on self-reported race and ethnicity. This assessment was performed across four databases comprising approximately 12 million patients.

Results:

Applying the observation period filter led to a substantial reduction in data availability across all races and ethnicities in all four datasets. However, among those examined, the availability of data in the white group remained consistently higher compared to other racial groups after applying each filter. Conversely, the Black/African American group was the most impacted by each filter on these three datasets, Cedars-Sinai dataset, UK-Biobank, and Columbia University Dataset.

Conclusions:

Our findings underscore the importance of using only necessary filters as they might disproportionally affect data availability of minoritized racial and ethnic populations. Researchers must consider these unintentional biases when performing data-driven research and explore techniques to minimize the impact of these filters, such as probabilistic methods or the use of machine learning and artificial intelligence.


 Citation

Please cite as:

Acitores Cortina JM, Fatapour Y, Brown KL, Gisladottir U, Zietz M, Bear Don't Walk OJ IV, Peter D, Berkowitz JS, Friedrich NA, Kivelson S, Kuchi A, Liu H, Srinivasan A, Tsang KK, Tatonetti NP

Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis

JMIR Med Inform 2025;13:e67591

DOI: 10.2196/67591

PMID: 40146917

PMCID: 11967746

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.