Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: JMIR Formative Research

Date Submitted: Jun 22, 2023
Date Accepted: Apr 24, 2024

The final, peer-reviewed published version of this preprint can be found here:

Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics

Meczner A, Cohen N, Qureshi A, Reza M, Sutaria S, Blount E, Bagyura Z, Malak T

Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics

JMIR Form Res 2024;8:e49907

DOI: 10.2196/49907

PMID: 38820578

PMCID: 11179013

Controlling inputter variability in vignette studies assessing online symptom checkers: Evaluation of current practice and recommendations for isolated accuracy metrics

  • Andras Meczner; 
  • Nathan Cohen; 
  • Aleem Qureshi; 
  • Maria Reza; 
  • Shailen Sutaria; 
  • Emily Blount; 
  • Zsolt Bagyura; 
  • Tamer Malak

ABSTRACT

Background:

The rapid growth in online symptom checkers (OSC) is not matched by advances in quality assurance. Currently there are no widely accepted criteria assessing OSCs performance. Vignette studies are widely used to evaluate OSCs, measuring accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual OSC and tester dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability.

Objective:

(1) To investigate the impact of tester variability on accuracy of outcome of OSCs, using clinical vignettes. (2) To investigate the feasibility of measuring isolated aspects of performance.

Methods:

Healthily’s OSC was assessed using 114 vignettes, by three groups of three testers, who processed vignettes with different instructions: (a) free interpretation of vignettes (free testers), (b) specified chief complaint(s) (partially free testers) and (c) specified chief complaint(s) with strict instruction for answering additional symptoms (restricted testers). Kappa statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a Gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of OSCs was performed using different variations of 51 chief complaints across three OSCs.

Results:

Inter-tester agreement of most likely condition and triage respectively was (a) 0.49 and 0.51 for the free tester group; b) 0.66 and 0.66 for the partially free group and (c) 0.72 and 0.71 for the restricted group For the restricted group, accuracy ranged from 43.9% to 57% for individual testers averaging 50.6%. Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all three OSCs. Comprehension scores ranged from 52.9% and 68%.

Conclusions:

We demonstrated that by improving standardisation of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester dependent factors, reflected by varying outcome accuracy. Tester dependent factors combined with a small number of testers, limit the reliability and generalisability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of OSC performance in isolation provides a more reliable assessment of OSC performance. We developed an adjusted accuracy measure using a review and selection process to assess data/algorithm quality. Additionally, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardisation and isolated metrics.


 Citation

Please cite as:

Meczner A, Cohen N, Qureshi A, Reza M, Sutaria S, Blount E, Bagyura Z, Malak T

Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics

JMIR Form Res 2024;8:e49907

DOI: 10.2196/49907

PMID: 38820578

PMCID: 11179013

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.