JMIR Preprints #32460: Attention-Based Models for Classifying Small Datasets using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

Attention-Based Models for Classifying Small Datasets using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

Brian J. Ferrell;
Sarah E. Raskin;
Emily B. Zimmerman;
David H. Timberline;
Bridget T. McInnes;
Alex H. Krist

ABSTRACT

Background:

Community-Engaged Research (CEnR) is a research approach in which scholars partner with community organizations or individuals with whom they share an interest in the study topic, typically with the goal of supporting that community’s wellbeing. CEnR is well-established in numerous disciplines including the clinical and social sciences. However, universities experience challenges reporting comprehensive CEnR metrics, limiting development of appropriate CEnR infrastructure and advancement of relationships with communities, funders, and stakeholders.

Objective:

n/a

Methods:

We propose a novel approach to identifying and categorizing community-engaged studies by applying attention-based deep learning models to human subjects protocols that have been submitted to the university’s Institutional Review Board (IRB). We manually classified a sample of protocols submitted to the IRB using a 3 and 6-level CEnR heuristic. We then trained an attention-based Bidirectional-LSTM on the classified protocols and compared it to transformer models such as BERT, Bio+ClinicalBERT, and XLM-RoBERTa. We applied the best performing models to the full sample of unlabeled IRB protocols submitted in the years 2013-2019 (n > 6000).

Results:

Transfer learning appears to be superior, receiving a .9952 testing F1 Score for all transformer models implemented compared to the attention-based Bi-LSTM model. This finding is consistent across several methodological adjustments: an augmented dataset with and without cross-validation, an unaugmented dataset with and without cross-validation, a 6 class CEnR spectrum, and a 3 class one. BERT and the transformer models showed an understanding of our data unlike the attention-based model, promising usability for real-world application.

Conclusions:

Transfer learning is a viable method for differentiating small datasets characterized by the idiosyncrasies and errors of CEnR descriptions used by principal investigators in research protocols.

Citation

Please cite as:

Ferrell BJ, Raskin SE, Zimmerman EB, Timberline DH, McInnes BT, Krist AH

Attention-Based Models for Classifying Small Data Sets Using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

JMIR Form Res 2022;6(9):e32460

DOI: 10.2196/32460

PMID: 36066925

PMCID: 9490525

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jul 30, 2021

Open Peer Review Period: Jul 30, 2021 - Sep 24, 2021

Date Accepted: Jun 15, 2022

(closed for review but you can still tweet)

Attention-Based Models for Classifying Small Datasets using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study

ABSTRACT

Citation

Copyright