Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 4, 2020
Date Accepted: Nov 11, 2020

The final, peer-reviewed published version of this preprint can be found here:

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Hwang H, Yoo S

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

J Med Internet Res 2020;22(12):e18526

DOI: 10.2196/18526

PMID: 33295294

PMCID: 7758167

Transformation of Pathology Reports into the Common Data Model with Oncology Module: Use Case for Colon Cancer

  • Borim Ryu; 
  • Eunsil Yoon; 
  • Seok Kim; 
  • Sejoon Lee; 
  • Hyunyoung Baek; 
  • Soyoung Yi; 
  • Hee Young Na; 
  • Ji-Won Kim; 
  • Rong-Min Baek; 
  • Hee Hwang; 
  • Sooyoung Yoo

ABSTRACT

Background:

Common data models (CDMs) help standardize electronic health record data and facilitate outcome analysis for observational and longitudinal research. An analysis of pathology reports is required to establish fundamental information infrastructure for data-driven colon cancer research. The Observational Medical Outcomes Partnership (OMOP) CDM is used in distributed research networks for clinical data; however, it requires conversion of free-text–based pathology reports into the CDM’s format. There are few use cases of representing cancer data in CDM.

Objective:

In this study, we aimed to construct a CDM database of colon-cancer–related pathology with natural language processing (NLP) for a research platform that can utilize both clinical and omics data. The essential text entities from the pathology reports are extracted, standardized, and converted to the OMOP CDM format in order to utilize the pathology data in cancer research.

Methods:

We extracted clinical text entities, mapped them to the standard concepts in the Observational Health Data Sciences and Informatics (OHDSI) vocabularies, and built databases and defined relations for the CDM tables. Major clinical entities were extracted through NLP on pathology reports of surgical specimens, immunohistochemical studies, and molecular study of colon cancer patients at a tertiary general hospital in South Korea. Items were extracted from each report using regular expressions in Python. Unstructured data, such as text that does not have a pattern, were handled with expert advice by adding regular expression rules. Our own dictionary was used for normalization and standardization to deal with biomarker and gene names and other ungrammatical expressions. The extracted clinical and genetic information was mapped to the Logical Observation Identifiers Names and Codes (LOINC) databases and the Systematized Nomenclature of Medicine (SNOMED) standard terminologies recommended by the OMOP CDM. The database–table relationships were newly defined through SNOMED standard terminology concepts. The standardized data were inserted into the CDM tables. For evaluation, 100 reports were randomly selected and independently annotated by a medical informatics expert and a nurse.

Results:

We examined and standardized 1,848 immunohistochemical study reports, 3,890 molecular study reports, and 12,352 pathology reports of surgical specimens (from 2017 to 2018). The constructed and updated database contained the following extracted colorectal entities: 1) NOTE_NLP, 2) MEASUREMENT, 3) CONDITION_OCCURRENCE, 4) SPECIMEN, and 5) FACT_RELATIONSHIP of specimen with condition and measurement.

Conclusions:

This study was aimed at preparing CDM data for a research platform to take advantage of all omics clinical and patient data at Seoul National University Bundang Hospital (SNUBH) for colon cancer pathology. A more sophisticated preparation of the pathology data is needed for further research on cancer genomics, and various types of text narratives are the next target for additional research on the use of data in the CDM.


 Citation

Please cite as:

Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Hwang H, Yoo S

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

J Med Internet Res 2020;22(12):e18526

DOI: 10.2196/18526

PMID: 33295294

PMCID: 7758167

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.