Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Dec 10, 2019
Date Accepted: Jul 28, 2020

The final, peer-reviewed published version of this preprint can be found here:

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation

Liu S, Wang Y, Wen A, Wang L, Hong N, Shen F, Bedrick S, Hersh W, Liu H

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation

JMIR Med Inform 2020;8(10):e17376

DOI: 10.2196/17376

PMID: 33021486

PMCID: 7576539

On Cohort Retrieval System from Clinical Data Repositories using OMOP Common Data Model: A Proof-of-Concept Implementation

  • Sijia Liu; 
  • Yanshan Wang; 
  • Andrew Wen; 
  • Liwei Wang; 
  • Na Hong; 
  • Feichen Shen; 
  • Steven Bedrick; 
  • William Hersh; 
  • Hongfang Liu

ABSTRACT

Background:

Widespread adoption of electronic health records (EHRs) has enabled secondary use of EHR data for clinical research and healthcare delivery. Natural language processing (NLP) techniques have shown promise in their capability to extract the embedded information in unstructured clinical data, and information retrieval (IR) techniques provide flexible and scalable solutions that can augment the NLP systems for retrieving and ranking relevant records.

Objective:

In this paper, we present the implementation of Cohort Retrieval Enhanced by Analysis of Text from EHRs (CREATE), a cohort retrieval system that can execute textual cohort selection queries on both structured and unstructured EHR data.

Methods:

CREATE is a proof-of-concept system that leverages a combination of structured queries and IR techniques on NLP results to improve cohort retrieval performance while adopting the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to enhance model portability. The NLP component empowered by cTAKES is used to extract CDM concepts from textual queries. We design a hierarchical index in Elasticsearch to support CDM concept search utilizing IR techniques and frameworks.

Results:

Our case study on 5 cohort identification queries evaluated using the IR metric, P@5 (Precision at 5) at both the patient-level and document-level, demonstrates that CREATE achieves an average P@5 of 0.90, which outperforms systems using only structured data or only unstructured data with average P@5s of 0.54 and 0.74, respectively.

Conclusions:

The implementation and evaluation on Mayo Clinic Biobank demonstrated that CREATE outperforms cohort retrieval systems using only one of either structured or unstructured data in complex textual cohort queries. The source code is made available at: https://github.com/OHNLPIR/OMOP_CDM_IO.


 Citation

Please cite as:

Liu S, Wang Y, Wen A, Wang L, Hong N, Shen F, Bedrick S, Hersh W, Liu H

Implementation of a Cohort Retrieval System for Clinical Data Repositories Using the Observational Medical Outcomes Partnership Common Data Model: Proof-of-Concept System Validation

JMIR Med Inform 2020;8(10):e17376

DOI: 10.2196/17376

PMID: 33021486

PMCID: 7576539

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.