JMIR Preprints #14083: SALT-C: Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

SALT-C: Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research

Mina Kim;
Soo-Yong Shin;
Mira Kang;
Byoung-Kee Yi;
Dong Kyung Chang

ABSTRACT

Background:

Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods.

Objective:

We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology.

Methods:

We developed a method called Standardization Algorithm for Laboratory Test–Categorical result (SALT-C) that can process categorical laboratory data, such as “pos +,” “250 4+ (urinalysis results),” and “reddish (urinalysis color results).” SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into five predefined groups (urine color, urine dipstick, blood type, presence finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned. The source code of SALT-C can be downloaded via https://github.com/rpmina/SALT_C.

Results:

The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (e.g., “**” and “_^”), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.62% (urine color tests), 97.54% (urine dipstick tests), 94.64% (blood type tests), 99.68% (presence finding tests), and 99.61% (pathogenesis tests).

Conclusions:

The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts.

Citation

Please cite as:

Kim M, Shin SY, Kang M, Yi BK, Chang DK

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study

JMIR Med Inform 2019;7(3):e14083

DOI: 10.2196/14083

PMID: 31469075

PMCID: 6740165

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Mar 24, 2019

Open Peer Review Period: Mar 27, 2019 - May 22, 2019

Date Accepted: Jul 19, 2019

(closed for review but you can still tweet)

SALT-C: Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research

ABSTRACT

Citation

Copyright