Assessing the Capability of Large Language Models for Navigation of the Australian Healthcare System: A Comparative Study
ABSTRACT
Background:
Australians in rural and regional areas face significant challenges in navigating the healthcare system, including limited services, fewer treatment options, and difficulty understanding their entitlements. Generative search tools, powered by large language models (LLMs), show promise in improving health information retrieval by generating direct answers. However, concerns remain regarding their accuracy and reliability when compared to traditional search engines, in a healthcare context.
Objective:
This study aimed to compare the effectiveness of a generative AI search (Microsoft Copilot) versus a conventional search engine (Google Web Search) for navigating healthcare information.
Methods:
A total of 97 adults in Queensland participated in an online survey, answering scenario-based healthcare navigation questions using either Microsoft Copilot or Google Web Search. Accuracy was assessed using binary correct/incorrect ratings, graded correctness (incorrect, partially correct, correct), and numerical scores (0–2 for service identification, 0–6 for criteria). Participants also completed a Technology Rating Questionnaire (TRQ) to evaluate their experience with their assigned tool.
Results:
Participants assigned to Microsoft Copilot outperformed the Google Web Search group on two healthcare navigation tasks (identifying aged care application services and listing mobility allowance eligibility criteria), with no clear evidence of a difference the remaining six tasks. On the TRQ, participants rated Google Web Search higher in willingness to adopt and perceived impact on quality of life, and lower in effort needed to learn. Both tools received similar ratings in perceived value confidence, help required to use, and concerns about privacy.
Conclusions:
Generative AI tools can achieve comparable accuracy to traditional search engines for healthcare navigation tasks, though this did not translate into an improved user experience. Further evaluation is needed as AI technology improves and users become more familiar with its use.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.