JMIR Preprints #16042: Clinical Annotation Research Kit (CLARK): A Computable Phenotyping Tool Using Machine Learning

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Clinical Annotation Research Kit (CLARK): A Computable Phenotyping Tool Using Machine Learning

Emily R Pfaff;
Miles Crosskey;
Kenneth Morton;
Ashok Krishnamurthy

ABSTRACT

Introduction: Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that enables clinical and translational researchers to use machine-learning based NLP for computable phenotyping without requiring deep informatics expertise.

Methods:

CLARK enables non-expert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier.

Results:

Example phenotypes where CLARK has been applied include pediatric diabetes (0.91/0.98 sensitivity/specificity), symptomatic uterine fibroids (0.81/0.54 PPV/NPV), nonalcoholic fatty liver disease (0.90/0.94 sensitivity/specificity), and primary ciliary dyskinesia (0.88 / 1.0 sensitivity/specificity). Discussion: In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that non-expert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. Our hope is to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.

Citation

Please cite as:

Pfaff ER, Crosskey M, Morton K, Krishnamurthy A

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

JMIR Med Inform 2020;8(1):e16042

DOI: 10.2196/16042

PMID: 32012059

PMCID: 7007592

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 28, 2019

Date Accepted: Dec 16, 2019

Clinical Annotation Research Kit (CLARK): A Computable Phenotyping Tool Using Machine Learning

ABSTRACT

Citation

Copyright