Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Jan 24, 2024
Open Peer Review Period: Jan 29, 2024 - Mar 25, 2024
Date Accepted: Jun 8, 2024
Date Submitted to PubMed: Jun 8, 2024
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study

Sood PD, Liu S, Lehmann H, Kharrazi H

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study

JMIR Med Inform 2024;12:e56734

DOI: 10.2196/56734

PMID: 38850555

PMCID: 11370182

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients with Type 2 Diabetes

  • Priyanka Dua Sood; 
  • Star Liu; 
  • Harold Lehmann; 
  • Hadi Kharrazi

ABSTRACT

Background:

Increasing and substantial reliance on Electronic health records (EHR) and data types (i.e., diagnosis (Dx), medication (Rx), laboratory (Lx)) demands assessment of its data quality (DQ) as a fundamental approach; especially since there is need to identify appropriate denominator population with chronic conditions, such as Type-2 Diabetes (T2D), using commonly available computable phenotype definitions (phenotype).

Objective:

To bridge this gap, our study aims to assess how issues of EHR DQ, and variations and robustness (or lack thereof) in phenotypes may have potential impact in identifying denominator population.

Methods:

Approximately 208k patients with T2D were included in our study using retrospective EHR data of Johns Hopkins Medical Institution (JHMI) during 2017-2019. Our assessment included 4 published phenotypes, and 1 definition from a panel of experts at Hopkins. We conducted descriptive analyses of demographics (i.e., age, sex, race, ethnicity), healthcare utilization (inpatient and emergency room visits), and average Charlson Comorbidity score of each phenotype. We then used different methods to induce/simulate DQ issues of completeness, accuracy and timeliness separately across each phenotype. For induced data incompleteness, our model randomly dropped Dx, Rx, and Lx codes independently at increments of 10%; for induced data inaccuracy, our model randomly replaced a Dx or Rx code with another code of the same data type and induced 2% incremental change from -10% to +10% in Lx result values; and lastly, for timeliness, data was modeled for induced incremental shift of date records by 30 days up to a year.

Results:

Less than a quarter (23%) of population overlapped across all phenotypes using EHR. The population identified by each phenotype varied across all combination of data types. Induced incompleteness identified fewer patients with each increment, for e.g., at 100% diagnostic incompleteness, Chronic Conditions Data Warehouse (CCW) phenotype identified zero patients as its phenotypic characteristics included only Dx codes. Induced inaccuracy and timeliness similarly demonstrated variations in performance of each phenotype and therefore, resulting in fewer patients being identified with each incremental change.

Conclusions:

We utilized EHR data with Dx, Rx, and Lx data types from a large tertiary hospital system to understand the T2D phenotypic differences and performance. We learned how issues of DQ, using induced DQ methods, may impact identification of the denominator populations upon which clinical (e.g., clinical research and trials, population health evaluations) and financial/operational decisions are made. The novel results from our study may inform in shaping a common T2D computable phenotype definition that can be applicable to clinical informatics, managing chronic conditions, and additional healthcare industry-wide efforts. Clinical Trial: Not applicable


 Citation

Please cite as:

Sood PD, Liu S, Lehmann H, Kharrazi H

Assessing the Effect of Electronic Health Record Data Quality on Identifying Patients With Type 2 Diabetes: Cross-Sectional Study

JMIR Med Inform 2024;12:e56734

DOI: 10.2196/56734

PMID: 38850555

PMCID: 11370182

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.