Accepted for/Published in: JMIR Mental Health
Date Submitted: Jan 2, 2024
Open Peer Review Period: Jan 23, 2024 - Mar 19, 2024
Date Accepted: Mar 8, 2024
(closed for review but you can still tweet)
Warning: This is an author submission that is not peer-reviewed or edited. Preprints - unless they show as "accepted" - should not be relied on to guide clinical practice or health-related behavior and should not be reported in news media as established information.
The Invisible Embedded “Values” Within Large Language Models: Implications for Mental Health Use
ABSTRACT
Background:
Background:
Large language models (LLMs) hold promises for mental health applications due to their impressive language capabilities. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have an ethical importance. Schwartz's Theory of Basic Values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. This study leverages STBV to map the motivational values-like infrastructure underpinning leading LLMs.
Objective:
Objectives: This study aimed to (1) evaluate whether Schwartz's Theory of Basic Values, a framework quantifying cultural value orientations, can measure values-like constructs within leading LLMs; and (2) determine if LLMs exhibit distinct values-like patterns from humans and each other.
Methods:
Methods:
Four LLMs (Bard, Claude 2, ChatGPT-3.5, ChatGPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess values-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 humans across 49 nations that had completed the PVQ-RR. This allowed assessing if the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests.
Results:
Results:
The PVQ-RR showed good reliability and validity for quantifying values-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction while deemphasizing achievement, power and security relative to humans. Successful discriminant analysis differentiated the four models' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational values-like constructs that shape their decision-making.
Conclusions:
Conclusions:
While the study demonstrated Schwartz's theory can effectively characterize values-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the models underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental healthcare must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.