Accepted for/Published in: JMIR Formative Research
Date Submitted: Jul 30, 2021
Open Peer Review Period: Jul 30, 2021 - Sep 24, 2021
Date Accepted: Jun 15, 2022
(closed for review but you can still tweet)
Attention-Based Models for Classifying Small Datasets using Community-Engaged Research Protocols: Classification System Development and Validation Pilot Study
ABSTRACT
Background:
Community-Engaged Research (CEnR) is a research approach in which scholars partner with community organizations or individuals with whom they share an interest in the study topic, typically with the goal of supporting that community’s wellbeing. CEnR is well-established in numerous disciplines including the clinical and social sciences. However, universities experience challenges reporting comprehensive CEnR metrics, limiting development of appropriate CEnR infrastructure and advancement of relationships with communities, funders, and stakeholders.
Objective:
n/a
Methods:
We propose a novel approach to identifying and categorizing community-engaged studies by applying attention-based deep learning models to human subjects protocols that have been submitted to the university’s Institutional Review Board (IRB). We manually classified a sample of protocols submitted to the IRB using a 3 and 6-level CEnR heuristic. We then trained an attention-based Bidirectional-LSTM on the classified protocols and compared it to transformer models such as BERT, Bio+ClinicalBERT, and XLM-RoBERTa. We applied the best performing models to the full sample of unlabeled IRB protocols submitted in the years 2013-2019 (n > 6000).
Results:
Transfer learning appears to be superior, receiving a .9952 testing F1 Score for all transformer models implemented compared to the attention-based Bi-LSTM model. This finding is consistent across several methodological adjustments: an augmented dataset with and without cross-validation, an unaugmented dataset with and without cross-validation, a 6 class CEnR spectrum, and a 3 class one. BERT and the transformer models showed an understanding of our data unlike the attention-based model, promising usability for real-world application.
Conclusions:
Transfer learning is a viable method for differentiating small datasets characterized by the idiosyncrasies and errors of CEnR descriptions used by principal investigators in research protocols.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.