Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Medical Informatics

Date Submitted: Aug 28, 2019
Date Accepted: Dec 16, 2019

The final, peer-reviewed published version of this preprint can be found here:

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

Pfaff ER, Crosskey M, Morton K, Krishnamurthy A

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

JMIR Med Inform 2020;8(1):e16042

DOI: 10.2196/16042

PMID: 32012059

PMCID: 7007592

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Clinical Annotation Research Kit (CLARK): A Computable Phenotyping Tool Using Machine Learning

  • Emily R Pfaff; 
  • Miles Crosskey; 
  • Kenneth Morton; 
  • Ashok Krishnamurthy

ABSTRACT

Introduction: Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that enables clinical and translational researchers to use machine-learning based NLP for computable phenotyping without requiring deep informatics expertise.

Methods:

CLARK enables non-expert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier.

Results:

Example phenotypes where CLARK has been applied include pediatric diabetes (0.91/0.98 sensitivity/specificity), symptomatic uterine fibroids (0.81/0.54 PPV/NPV), nonalcoholic fatty liver disease (0.90/0.94 sensitivity/specificity), and primary ciliary dyskinesia (0.88 / 1.0 sensitivity/specificity). Discussion: In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that non-expert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. Our hope is to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.


 Citation

Please cite as:

Pfaff ER, Crosskey M, Morton K, Krishnamurthy A

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

JMIR Med Inform 2020;8(1):e16042

DOI: 10.2196/16042

PMID: 32012059

PMCID: 7007592

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.