Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Mar 4, 2020
Date Accepted: Nov 11, 2020

The final, peer-reviewed published version of this preprint can be found here:

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Yoo S

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

J Med Internet Res 2020;22(12):e18526

DOI: 10.2196/18526

PMID: 33295294

PMCID: 7758167

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Transformation of Pathology Reports into the Common Data Model with Oncology Module: Use Case for Colon Cancer

  • Borim Ryu; 
  • Eunsil Yoon; 
  • Seok Kim; 
  • Sejoon Lee; 
  • Hyunyoung Baek; 
  • Soyoung Yi; 
  • Hee Young Na; 
  • Ji-Won Kim; 
  • Rong-Min Baek; 
  • Sooyoung Yoo

ABSTRACT

Background:

A common data model (CDM) helps to standardize electronic health record (EHR) data and eases the analysis of outcomes for observational and longitudinal research. An analysis of pathology reports is required to establish fundamental information infrastructure for data-driven colon cancer research. The Observational Medical Outcomes Partnership (OMOP) CDM is used in distributed research networks for clinical data; however, it requires conversion of free-text–based pathology reports into the CDM. There are few use cases of representing cancer data in CDM.

Objective:

In this study, we aimed to construct a colon-cancer–related pathological CDM database with natural language processing (NLP) for a research platform that could utilize both clinical and omics data. The essential text entities from the pathology reports were extracted, standardized, and converted to the OMOP CDM to utilize the pathology data in cancer research.

Methods:

We extracted clinical text entities, mapped them to the standard concepts in the Observational Health Data Sciences and Informatics (OHDSI) vocabularies, and built database and defined relations for the CDM tables. Major clinical entities were extracted through NLP on immunochemistry tests, molecular genetic tests, and surgical pathology reports of colon-cancer patients at a tertiary general hospital in South Korea. Items were extracted from each report using a regular expression based on Python. Unstructured data, text that does not have a particular pattern, was handled with expert advice by adding regular expression rules. Our own dictionary was used for normalization and standardization to deal with biomarker and gene names and other ungrammatical expressions. The extracted clinical and genetic information was mapped to the logical observation identifiers names and codes (LOINC) and the systematized nomenclature of medicine (SNOMED) standard terminologies recommended by OMOP CDM. The database-table relationships were newly defined through SNOMED standard terminology concepts. The standardized data were inserted into the CDM tables. For evaluation, 100 reports were randomly selected and independently annotated by a medical informatics expert and a nurse.

Results:

We examined and standardized 1,848 immunochemistry test reports, 3,890 molecular genetic test reports, and 12,352 surgical pathology reports (from 2017 to 2018). The constructed and updated database contained the following extracted colorectal entities: 1) NOTE_NLP, 2) MEASUREMENT, 3) CONDITION_OCCURRENCE, 4) SPECIMEN, and 5) FACT_RELATIONSHIP of specimen with condition and measurement.

Conclusions:

This study was aimed at preparing CDM data for the research platform to take advantage of all the omics clinical and patient data at Seoul National University Bundang Hospital (SNUBH) for colon-cancer pathology. A more sophisticated preparation of the pathology data is needed for further research on cancer genomics, and various types of text narratives are the next target for additional research on the use of data in the CDM.


 Citation

Please cite as:

Ryu B, Yoon E, Kim S, Lee S, Baek H, Yi S, Na HY, Kim JW, Baek RM, Yoo S

Transformation of Pathology Reports Into the Common Data Model With Oncology Module: Use Case for Colon Cancer

J Med Internet Res 2020;22(12):e18526

DOI: 10.2196/18526

PMID: 33295294

PMCID: 7758167

The author of this paper has made a PDF available, but requires the user to login, or create an account.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.