Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: May 7, 2025
Date Accepted: Jun 23, 2025
Using the ATra Black Box Technology to Improve Public Health Data Linkages and Analytics in the DC Cohort Longitudinal HIV Study
ABSTRACT
Background:
The DC Cohort is a longitudinal HIV cohort study of people with HIV (PWH) receiving care at 14 clinical sites in Washington, DC led by George Washington University (GW). Data are routinely linked to DC Department of Health (DC Health) HIV surveillance databases to increase data completeness and accuracy and to help identify PWH enrolled at multiple sites. The ATra Black Box (Black Box) is a novel privacy technology developed by Georgetown University (GU) which is currently deployed in 40 public health jurisdictions. The Black Box provides a secure mechanism to link private health information across data systems.
Objective:
The Black Box was modified for the purposes of linking data from the DC Cohort to DC Health surveillance data and increasing the ease, feasibility, accuracy and timeliness of future linkages. These modifications included providing de-identified data to GW and developing analytic code to compare data between the DC Cohort and DC Health to report on data discrepancies. This paper reports on the results of the initial linkage using the Black Box.
Methods:
DC Cohort data on all consented participants from January 2011 through September 2022 were submitted to the Black Box. Simultaneously, all DC Health HIV surveillance data were also submitted to the Black Box. The data were matched using a pre-determined algorithm, match level scores were assigned, and matches were manually verified. The new Black Box graphical user interface allowed users to check files for errors and to easily track the Box processes, and provided analytic plugins for running SAS code.
Results:
9,744 DC Cohort participants’ records were submitted for matching to DC Health; of those 9,060 participants (93%) matched to surveillance data and were validated through manual review. Match level scores ranged from 20 to 100 and the validation found that scores of 61 and above were true matches. The SAS output files provided information on missing or conflicting data, including labs, date of HIV diagnosis and other key demographics. The linkage resulted in the addition of 48,970 CD4 counts, 33,413 viral load labs and 767 previously unrecognized deaths. 470 Cohort participants were enrolled at more than one site, 17 at more than two sites.
Conclusions:
The implementation of the Black Box for sharing DC Cohort and DC Health data has resulted in improved capture of HIV labs, improved vital status information and enhanced characterization of care patterns for PWH enrolled in the Cohort. Future linkages will include DC Health data on sexually transmitted infections, hepatitis, and tuberculosis diagnoses.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.