Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Dec 16, 2019
Date Accepted: Feb 29, 2020
Inter-rater reliability of medical mobile application classification using the NICE evidence standards framework for digital health technologies
ABSTRACT
Background:
Clinical governance of medical mobile applications is challenging, and there is currently no standard method for assessing the quality of such apps. In 2018, the National Institute of Health and Care Excellence (NICE), developed a framework for assessing the required level of evidence for digital healthcare technologies (DHT), as determined by their clinical function. The framework can potentially be used to assess mobile applications, which are a subset of DHTs. To be used reliably in this context, the framework must allow unambiguous classification of an app’s clinical function.
Objective:
The objective of this study was to determine whether mobile health apps could be reliably classified using the NICE evidence standards framework for digital health technologies.
Methods:
We manually extracted app titles, screenshots, and content descriptions for all apps listed on the NHS Apps Library website on 12 Jul 2019; none of the apps were downloaded. Using this information, two mHealth researchers independently classified each app to one of the four function Tiers {1,2,3a,3b} described in the NICE digital technologies evaluation framework. Coders also answered contextual questions from the framework to identify whether apps were deemed to be higher risk. Classification agreement was assessed using Cohen’s kappa.
Results:
In total, we assessed 76 apps from the NHS apps library. The reviewers agreed in 42 (55%) cases. Of these, no (0) apps were in Tier 1, 24 were in Tier 2, 15 in Tier 3a and 3 in Tier 3b. There was disagreement between coders in 34 cases (45%); inter-rater agreement was poor (Cohen’s kappa κ = 0.32 (95%CI: 0.16-0.47)). Further investigation of disagreements highlighted five main explanatory themes: apps that did not correspond to any tier; apps that corresponded to multiple tiers; ambiguous tier descriptions; ambiguous app descriptions; and coder error.
Conclusions:
The current iteration of the NICE digital technologies evaluation framework for digital health technologies did not allow mHealth researchers to consistently and unambiguously classify digital health mobile apps listed on the NHS app library according to their functional tier.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.