Accepted for/Published in: JMIR Serious Games
Date Submitted: Jun 29, 2023
Date Accepted: Jan 31, 2024
Measuring the Reliability of a Gamified Stroop Task: A Quantitative Experiment
ABSTRACT
Background:
Few gamified cognitive tasks are subjected to rigorous examination of psychometric properties, despite their use in experimental and clinical settings. Even small manipulations to cognitive tasks require extensive research to understand their effects.
Objective:
With this study, we seek to research how game elements can affect the reliability of scores on a Stroop task. We specifically investigate within and across session performance consistency.
Methods:
We created two versions of the Stroop task, with and without game elements, and then tested each task with participants at two time points. In this paper, we report on the reliability of the gamified Stroop task, in terms of internal consistency and test-retest reliability, compared to the control task. We also compare the reliability of scores on a trial-by-trial basis, with consideration of the different trial types.
Results:
In session 1, the Stroop effect was reduced in the game version, indicating an increase in performance. Further, the game led to higher measures of internal consistency in both sessions for error rates as well as reaction times, which indicates a more consistent response pattern. Test-retest reliability analysis revealed a distinctive performance pattern depending on the trial type that may be reflective of motivational differences between the task versions. In short, especially in the incongruent trials where cognitive conflict occurs, the performance in the game reaches peak consistency after 100 trials, while performance consistency drops after 50 trials for the basic version and only catches up to the game after 250 trials.
Conclusions:
Even subtle gamification can impact task performance, albeit not only in a direct difference in performance between conditions. People playing the game reach peak performance sooner, and their performance is more consistent within and across sessions. We advocate for a closer look at the impact of game elements on performance. Further, given the increased reliability of game-like tasks, they may be especially suitable for assessing populations that are not able to perform the task for an extended time period.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.