Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Oct 29, 2024
Open Peer Review Period: Oct 29, 2024 - Dec 24, 2024
Date Accepted: May 15, 2025
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Identifying people living with or those at risk for HIV in a nationally-sampled electronic health record repository called the National Clinical Cohort Collaborative (N3C): A cohort study
ABSTRACT
Background:
Electronic health records (EHR) provide valuable insights to address clinical and epidemiological research concerning HIV, including the disproportionate impact of the COVID-19 pandemic on this population. To identify people living with HIV (PLWH), most studies using EHR or claims databases start with diagnostic codes, which can result in misclassification without further refinement using drug or laboratory data. Furthermore, given that antiretrovirals now have indications for both HIV and COVID-19 (i.e., ritonavir in nirmatrelvir/ritonavir), new phenotyping methods are needed to better capture PLWH. Therefore, we created a generalizable and innovative method to robustly identify PLWH, pre-exposure prophylaxis (PrEP) users, post-exposure prophylaxis (PEP) users, and people not living with HIV (PNLWH) using granular clinical data after the emergence of COVID-19.
Objective:
The primary aim of this work was to use computational phenotyping in EHR data to identify PLWH (cohort 1), people prescribed PrEP (cohort 2), PEP (cohort 3) or none of the above (PNLWH, cohort 4), and describe COVID-19 related characteristics among these cohorts.
Methods:
We used diagnostic, laboratory measurement, and drug concepts within the National Clinical Cohort Collaborative (N3C) to create a computational phenotype for four cohorts with confidence levels. For robustness, we conducted a randomly sampled, blinded clinician annotation to assess precision. We calculated the distribution of demographics, comorbidities, and COVID-19 variables among our four cohorts.
Results:
We identified 132,664 PLWH with a high level of confidence, 36,088 PrEP users, 4,120 PEP users and 20,639,675 PNLWH. Most PLWH were identified by a combination of conditions, laboratory measurements, and drug exposures (74,809, 56.4%), followed by labs and drugs (15,241, 11.5%), then conditions and drugs (14,595, 11.0%). A higher proportion of PLWH experienced COVID-19-related hospitalization 4,650 (3.51%) or mortality 828 (0.62%), and all-cause mortality 2,083 (1.57%) compared to other cohorts.
Conclusions:
Using an extensive phenotyping algorithm leveraging granular data in an EHR repository, we have identified PLWH, PNLWH, PrEP and PEP users, and offer transferable lessons to optimize future EHR phenotyping for these cohorts.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.