Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Jun 6, 2023
Date Accepted: Mar 7, 2024
A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study
ABSTRACT
Background:
The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially in context of handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data as well as biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows necessary linkage and re-identification.
Objective:
Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions and using centralized services is often not possible, for example, when data is re-used and consent for this type of data processing is lacking. In this paper, we present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of rapid pandemic response in Europe.
Methods:
To overcome challenges caused by the heterogeneity of information technology (IT) infrastructures across institutions, the OPT was developed based on programmable runtime environments available at virtually every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration, to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large datasets, which we demonstrate through a comprehensive evaluation.
Results:
The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux and MacOS. It provides multi-user support and is configurable to meet the needs of different types of research projects. Within ORCHESTRA, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. To date, the software manages data from more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, even when the identities of hundreds of thousands of study subjects or biosamples are managed by a single OPT instance.
Conclusions:
Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.