Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 30, 2021
Open Peer Review Period: Jul 30, 2021 - Sep 24, 2021
Date Accepted: Jun 15, 2022
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

Ferrell BJ, Raskin SE, Zimmerman EB, Timberline DH, McInnes BT, Krist AH

Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

JMIR Form Res 2022;6(9):e32460

DOI: 10.2196/32460

PMID: 36066925

PMCID: 9490525

Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.

Applying attention-based models to classify small datasets: An example using community-engaged research protocols

  • Brian J. Ferrell; 
  • Sarah E. Raskin; 
  • Emily B. Zimmerman; 
  • David H. Timberline; 
  • Bridget T. McInnes; 
  • Alex H. Krist

ABSTRACT

Background:

Community-Engaged Research (CEnR) is a research approach in which scholars partner with community organizations or individuals with whom they share an interest in the study topic, typically with the goal of supporting that community’s wellbeing. CEnR is well-established in numerous disciplines including the clinical and social sciences. However, universities experience challenges reporting comprehensive CEnR metrics, limiting development of appropriate CEnR infrastructure and advancement of relationships with communities, funders, and stakeholders.

Objective:

n/a

Methods:

We propose a novel approach to identifying and categorizing community-engaged studies by applying attention-based deep learning models to human subjects protocols that have been submitted to the university’s Institutional Review Board (IRB). We manually classified a sample of protocols submitted to the IRB using a 3 and 6-level CEnR heuristic. We then trained an attention-based Bidirectional-LSTM on the classified protocols and compared it to transformer models such as BERT, Bio+ClinicalBERT, and XLM-RoBERTa. We applied the best performing models to the full sample of unlabeled IRB protocols submitted in the years 2013-2019 (n > 6000).

Results:

Transfer learning appears to be superior, receiving a .9952 testing F1 Score for all transformer models implemented compared to the attention-based Bi-LSTM model. This finding is consistent across several methodological adjustments: an augmented dataset with and without cross-validation, an unaugmented dataset with and without cross-validation, a 6 class CEnR spectrum, and a 3 class one. BERT and the transformer models showed an understanding of our data unlike the attention-based model, promising usability for real-world application.

Conclusions:

Transfer learning is a viable method for differentiating small datasets characterized by the idiosyncrasies and errors of CEnR descriptions used by principal investigators in research protocols.


 Citation

Please cite as:

Ferrell BJ, Raskin SE, Zimmerman EB, Timberline DH, McInnes BT, Krist AH

Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

JMIR Form Res 2022;6(9):e32460

DOI: 10.2196/32460

PMID: 36066925

PMCID: 9490525

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.