Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Aug 9, 2019
Date Accepted: Dec 16, 2019
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
Responses of Conversational Agents to Health and Lifestyle Prompts: An Investigation of Appropriateness and Presentation Structures
ABSTRACT
Background:
Conversational agents (CAs) are systems that mimic human conversations using text or spoken language. Their widely-used examples include voice-activated systems like Apple Siri, Google Assistant, Amazon Alexa, or Microsoft Cortana. The use of CAs in healthcare has been on the rise, but concerns about their potential safety risks often remain under-studied.
Objective:
In this work we set out to analyze how commonly available, general-purpose CAs on smartphones and smart speakers respond to health and lifestyle prompts (questions and open-ended statements), examining their responses in terms of content and structure alike.
Methods:
We followed a piloted script to ask eight CAs health- and lifestyle-related prompts. The CAs’ responses were assessed for their appropriateness based on the prompt type: responses to safety-critical prompts were deemed appropriate if they included a referral to a health professional or service, while responses to lifestyle prompts were deemed appropriate if they provided relevant information to address the problem prompted. The response structure was also examined according to information sources (web-search-based or pre-coded), response content style (informative and/or directive), confirmation of prompt recognition, and empathy.
Results:
The eight studied CAs provided in total 240 responses to 30 prompts. They collectively responded appropriately to 41% (46/112) of the safety-critical and 39% (37/96) of the lifestyle prompts. The ratio of appropriate responses deteriorated when safety-critical prompts were re-phrased or when the agent used a voice-only interface. The appropriate responses included mostly directive content and empathy statements for the safety-critical prompts, and a mix of informative and directive content for the lifestyle prompts.
Conclusions:
Our results suggest that the commonly available, general-purpose CAs on smartphones and smart speakers with unconstrained natural language interfaces are limited in their ability to advise on both the safety-critical health prompts and lifestyle prompts. Our study also identified some response structures the CAs employed to present their appropriate responses. Further investigation is needed to establish guidelines for designing suitable response structures for different prompt types.
Citation