Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Aug 25, 2024
Open Peer Review Period: Sep 9, 2024 - Nov 4, 2024
Date Accepted: Jan 31, 2025
(closed for review but you can still tweet)
Integrating healthcare data in an i2b2 model persisted through Elasticsearch
ABSTRACT
Background:
The volume of digital data in healthcare is continually growing. In addition to being used in healthcare, the health data collected can also be used for secondary purposes, such as research. In this context, Clinical Data Warehouses (CDW) provide the infrastructure and organization needed to improve the secondary use of health data. Various data models have been proposed for organizing data in a CDW, including the i2b2 model, whose persistence is based on a relational database that can present performance problems when executing queries on massive data.
Objective:
The objective of this work is to describe the required transformation and the implementation of data persistence for part of the i2b2 model in a NoSQL Elasticsearch database.
Methods:
A comparison is made with data persistence in a standard relational database, in terms of query response, execution performance and in terms of material resource requirements. A description of the data loading and updating processes is also provided.
Results:
We propose adaptations of the i2b2 model to take into account the specific features of Elasticsearch, in particular the impossibility of performing joins between different indexes. The implementation has been tested and evaluated within the CDW of the Bordeaux University Hospital, which includes data on 2.5 million patients and more than 3 billion observations. Overall, elasticsearch query execution times are shorter than with a relational database. The performance gain is particularly significant for queries involving free-text searches. Compared with an indexed relational database (including a full-text index), the disk space required for storage is smaller with elasticsearch.
Conclusions:
We demonstrate that an Elasticsearch implementation is feasible, with a significant improvement in query performance and for disk space used for storage. This implementation is currently used in production at Bordeaux University Hospital.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.