Accepted for/Published in: JMIR Medical Informatics
Date Submitted: Nov 23, 2022
Open Peer Review Period: Nov 23, 2022 - Jan 18, 2023
Date Accepted: Jan 5, 2023
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
An ontology-based approach for consolidating patient data standardized with EN/ISO 13606 into joint OMOP repositories: description of a methodology
ABSTRACT
Background:
Despite the growth of big data technologies and the use of artificial intelligence, in order to discover new knowledge from data they must be correct and in a consistent format. EN/ISO 13606 is a health information standard that seeks to define a rigorous and stable architecture for communicating health records of a single patient, preserving the original clinical meaning, while OMOP CDM allows for the systematic analysis of disparate observational databases by defining a common format. OntoCR, a clinical repository developed at Hospital Clínic de Barcelona, uses ontologies to represent clinical knowledge and to map locally defined variables to health information standards and common data models.
Objective:
To design and implement a scalable methodology based on the dual model paradigm and the use of ontologies to consolidate clinical data from different organizations in a standardized repository for research purposes without loss of meaning.
Methods:
First, the relevant clinical variables must be defined and the corresponding EN/ISO 13606 archetypes created. Then, data sources are identified and an ETL process is carried out. Once the final dataset is obtained, data are transformed to create EN/ISO 13606-normalized electronic health records extracts. Afterwards, ontologies that represent archetyped concepts and map them to the EN/ISO 13606 and OMOP CDM standards are created and uploaded to OntoCR. Data stored in the aforementioned extracts are inserted into its corresponding place in the ontology by means of an application, thus obtaining instantiated patient data in the ontology-based repository. Finally, data can be extracted via SPARQL queries as OMOP CDM-compliant tables.
Results:
Using this methodology, we created EN/ISO 13606-standardized archetypes that allow the reuse of clinical information, and we extended the knowledge representation of our clinical repository by modeling and mapping ontologies. Furthermore, we created EN/ISO 13606-compliant EHR extracts of patients (6,803), episodes (13,938), diagnosis (190,878), accumulated and administered medication (222,225), prescribed medication (351,247), movement (47,817), clinical observation (6,736,745), laboratory observation (3,392,873), limitation of life-sustaining treatment (1,298) and procedure (19,861). Since the creation of the application that inserts data from extracts into the ontologies is not yet finished, we tested the queries and validated the methodology by importing data from a random subset of patients into the ontologies using a locally-developed Protégé plugin (“OntoLoad”). This way, we successfully created and populated ten OMOP CDM-compliant tables (CONDITION_OCCURRENCE, 864 records; DEATH, 110; DEVICE_EXPOSURE, 56; DRUG_EXPOSURE, 5,609; MEASUREMENT, 2,091; OBSERVATION, 195; OBSERVATION_PERIOD, 897; PERSON, 922; VISIT_DETAIL, 772; VISIT_OCCURRENCE, 971).
Conclusions:
This study proposes a methodology for standardizing clinical data, thus allowing its reuse without any change in the meaning of the modeled concepts. Despite the fact that the focus of this paper is health research, our methodology suggests the data be initially standardized according to EN/ISO 13606 to obtain EHR extracts with a high level of granularity that can be used for any purpose. Afterwards, its transformation to OMOP CDM-compliant tables allows its consolidation in joint repositories for research purposes. Ontologies constitute a valuable approach for knowledge representation and standardization of health information in a standard-agnostic manner. With the proposed methodology, institutions can go from local raw data to standardized, semantically interoperable EN/ISO 13606 and OMOP repositories.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.