Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 10, 2020
Date Accepted: Jan 31, 2021
A Framework for Criteria-based Selection and Processing of FHIR Data for Statistical Analysis: Design and Implementation
ABSTRACT
Background:
Harmonizing and standardizing the wealth of digital medical information has been pursued for many years. The research data repositories commonly created with research in mind typically require a large amount of harmonization and transformation efforts on the clinical data. The FHIR format on the other hand was designed to document clinical processes and is therefore closer to the clinical data model and more available across modern EHRs. However, all common standardized data formats do not directly lend themselves well to statistical analysis and therefore preprocessing of the data for statistical analysis is needed.
Objective:
This study aims to show how this preprocessing can be achieved on the FHIR format directly.
Methods:
We propose that the binary JSON (jsonb) format of the PostgreSQL open source database is suitable not only for storing FHIR data, but also for building preprocessing and filtering services on top of, which directly transform data stored in FHIR into prepared subsets of data for statistical analysis. We specified an interface for and implemented said preprocessor, deployed it at the University Hospital Erlangen-Nürnberg and created three example datasets and analyses on the available data.
Results:
We loaded patient data from 2016 to 2018 into a standard PostgreSQL database containing around 35.5 Mio. FHIR resources including: Patient, Encounter, Condition (ICD-10), Procedure (OPS) and Observation (laboratory results). We then integrated our pre-processing service with the PSQL database and the locally installed web-based KETOS analysis platform. We created three exemplary subsets and analyses to demonstrate the feasibility of the preprocessor.
Conclusions:
The study demonstrates how a standard open source tool like PSQL can be used to store FHIR data, but also the feasibility of developing further preprocessing on top to enable advanced filtering and prepared dataset creation for further statistical analysis. The web-based preprocessing could be deployed locally at a particular site, protecting a patient’s privacy and integrate well with existing open source data analysis tools currently being developed across Germany.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.