JMIR Preprints #56734: Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes

Priyanka Dua Sood;
Star Liu;
Harold Lehmann;
Hadi Kharrazi

ABSTRACT

Background:

Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).

Objective:

To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.

Methods:

Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -10% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.

Results:

Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.

Conclusions:

We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts. Clinical Trial: Not applicable

Citation

Please cite as:

Sood PD, Liu S, Lehmann H, Kharrazi H

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study

JMIR Med Inform 2024;12:e56734

DOI: 10.2196/56734

PMID: 38850555

PMCID: 11370182

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 24, 2024

Open Peer Review Period: Jan 29, 2024 - Mar 25, 2024

Date Accepted: Jun 8, 2024

Date Submitted to PubMed: Jun 8, 2024

(closed for review but you can still tweet)

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes

ABSTRACT

Citation

Copyright