Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.
Who will be affected?
Readers: No access to all 28 journals. We recommend accessing our articles via PubMed Central
Authors: No access to the submission form or your user account.
Reviewers: No access to your user account. Please download manuscripts you are reviewing for offline reading before Wednesday, July 01, 2020 at 7:00 PM.
Editors: No access to your user account to assign reviewers or make decisions.
Copyeditors: No access to user account. Please download manuscripts you are copyediting before Wednesday, July 01, 2020 at 7:00 PM.
Acitores Cortina JM, Fatapour Y, Brown KL, Gisladottir U, Zietz M, Bear Don't Walk OJ IV, Peter D, Berkowitz JS, Friedrich NA, Kivelson S, Kuchi A, Liu H, Srinivasan A, Tsang KK, Tatonetti NP
Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis
Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for 'Complete Data'
Jose Miguel Acitores Cortina;
Yasaman Fatapour;
Kathleen Larow Brown;
Undina Gisladottir;
Michael Zietz;
Oliver John Bear Don't Walk IV;
Danner Peter;
Jacob S. Berkowitz;
Nadine A. Friedrich;
Sophia Kivelson;
Aditi Kuchi;
Hongyu Liu;
Apoorva Srinivasan;
Kevin K Tsang;
Nicholas P. Tatonetti
ABSTRACT
Background:
-
Objective:
Integrated clinical databases from national biobanks have advanced the capacity for disease research. Data quality and completeness filters are used when building clinical cohorts to address limitations of data missingness. However, these filters may unintentionally introduce systemic biases when they are correlated with race and ethnicity. In this study, we examined the race/ethnicity biases introduced by applying common filters to four clinical records databases.
Methods:
We used 19 filters commonly used in electronic health records research on the availability of demographics, medication records, visit details, observation periods, and other data types. We evaluated the effect of applying these filters on self-reported race and ethnicity. This assessment was performed across four databases comprising approximately 12 million patients.
Results:
Applying the observation period filter led to a substantial reduction in data availability across all races and ethnicities in all four datasets. However, among those examined, the availability of data in the white group remained consistently higher compared to other racial groups after applying each filter. Conversely, the Black/African American group was the most impacted by each filter on these three datasets, Cedars-Sinai dataset, UK-Biobank, and Columbia University Dataset.
Conclusions:
Our findings underscore the importance of using only necessary filters as they might disproportionally affect data availability of minoritized racial and ethnic populations. Researchers must consider these unintentional biases when performing data-driven research and explore techniques to minimize the impact of these filters, such as probabilistic methods or the use of machine learning and artificial intelligence.
Citation
Please cite as:
Acitores Cortina JM, Fatapour Y, Brown KL, Gisladottir U, Zietz M, Bear Don't Walk OJ IV, Peter D, Berkowitz JS, Friedrich NA, Kivelson S, Kuchi A, Liu H, Srinivasan A, Tsang KK, Tatonetti NP
Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis