Accepted for/Published in: JMIR Formative Research
Date Submitted: May 4, 2024
Date Accepted: Dec 17, 2024
Date Submitted to PubMed: Jan 22, 2025
Customizing CAT Stopping Rules for Clinical Settings: A Simulation Study Using the NIH Toolbox Emotion Battery Negative Affect Subdomain
ABSTRACT
Background:
Patient-reported outcome measures (PROMs) are crucial for informed medical decisions and evaluating treatments. However, they can be burdensome for patients and sometimes lack the reliability clinicians need for clear clinical interpretations.
Objective:
To assess the extent to which applying alternative stopping rules can increase reliability – for clinical use – while minimizing burden for computerized adaptive tests (CATs).
Methods:
CAT simulations were conducted on three NIH Toolbox for Assessment of Neurological and Behavioral Function® (NIH Toolbox®) Emotion Battery adult item banks in the Negative Affect subdomain (i.e., Anger Affect, Fear Affect, and Sadness) containing at least eight items. In the originally applied NIH Toolbox CAT stopping rules, the CAT was stopped if the score standard error (SE) reached < 0.3 before 12 items were administered. We first contrasted this with a SE-change rule in a planned simulation analysis. We then contrasted the original rules with fixed-length CATs (4-12 items), a reduction of the maximum number of items to eight, and other modifications in post-hoc analyses. Burden was measured by the number of items administered per simulation, precision by the percentage of assessments yielding reliabilities cutoffs (0.85, 0.90, and 0.95), and accurate score recovery by correlation and root mean squared error (RMSE) between the generating theta and the CAT-estimated EAP-based theta.
Results:
In general, relative to the original rules, the alternative stopping rules slightly decreased burden while also increasing the proportion of assessments achieving high reliability for the adult banks; however, the SE-change rule and fixed-length CATs with eight or fewer items also notably increased assessments yielding reliability < 0.85. Among the alternative rules explored, the reduced maximum stopping rule best balanced precision and parsimony, presenting another option beyond the original rules.
Conclusions:
Our findings demonstrate the challenges in attempting to reduce test burden while also achieving score precision for clinical use. Stopping rules should be modified in accordance with the context of the study population and the purpose of the study.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.