Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Jan 11, 2023
Open Peer Review Period: Jan 11, 2023 - Mar 8, 2023
Date Accepted: Apr 5, 2023
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies

Hou J, Zhao R, Gronsbell J, Lin Y, Bonzel CL, Zeng Q, Zhang S, Beaulieu-Jones BK, Webber G, Jemielita T, Wan S, Hong C, Cai T, Wen J, Panickan VA, Liaw KL, Liao KP, Cai T

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies

J Med Internet Res 2023;25:e45662

DOI: 10.2196/45662

PMID: 37227772

PMCID: 10251230

RWE-Ready: Pipeline to harness electronic health records for real-world evidence

  • Jue Hou; 
  • Rachel Zhao; 
  • Jessica Gronsbell; 
  • Yucong Lin; 
  • Clara-Lea Bonzel; 
  • Qingyi Zeng; 
  • Sinian Zhang; 
  • Brett K Beaulieu-Jones; 
  • Griffin Webber; 
  • Thomas Jemielita; 
  • Shuyan Wan; 
  • Chuan Hong; 
  • Tianrun Cai; 
  • Jun Wen; 
  • Vidul A Panickan; 
  • Kai-Li Liaw; 
  • Katherine P. Liao; 
  • Tianxi Cai

ABSTRACT

While randomized controlled trials (RCTs) are the gold-standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data (RWD) has been vital in post-approval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of RWD is electronic health records (EHRs), which contain detailed information on patient care in both structured (e. g., diagnosis codes) and unstructured (e. g., clinical notes, images) form. Despite the granularity of the data available in EHRs, critical variables required to reliably assess the relationship between a treatment and clinical outcome can be challenging to extract. We provide an integrated data curation and modeling pipeline leveraging recent advances in natural language processing, computational phenotyping, modeling techniques with noisy data to address this fundamental challenge and accelerate the reliable use of EHRs for RWE, as well as the creation of digital twins. The proposed pipeline is highly automated for the task and includes guidance for deployment. Examples are also drawn from existing literature on EHR emulation of RCT and accompanied by our own studies with Mass General Brigham (MGB) EHR.


 Citation

Please cite as:

Hou J, Zhao R, Gronsbell J, Lin Y, Bonzel CL, Zeng Q, Zhang S, Beaulieu-Jones BK, Webber G, Jemielita T, Wan S, Hong C, Cai T, Wen J, Panickan VA, Liaw KL, Liao KP, Cai T

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies

J Med Internet Res 2023;25:e45662

DOI: 10.2196/45662

PMID: 37227772

PMCID: 10251230

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.