JMIR Preprints #55988: The Invisible Embedded “Values” Within Large Language Models: Implications for Mental Health Use

Current Preprint Settings

(as selected by the authors)

1. When the manuscript is submitted, allow peer review from:

(a) Anybody (open community peer review)
(b) Editor-selected reviewers (closed peer review)

2. When the manuscript is submitted, display the preprint PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

3. When the manuscript is accepted, display the accepted manuscript PDF to:

(a) Anybody, anytime
(b) Logged-in users only
(c) Anybody, anytime (title and abstract only)
(d) No one

The Invisible Embedded “Values” Within Large Language Models: Implications for Mental Health Use

Dorit Hadar-Shoval;
Kfir Asraf;
Yonathan Mizrachi;
Yuval Haber;
Zohar Elyoseph

ABSTRACT

Background:

Large language models (LLMs) hold promises for mental health applications due to their impressive language capabilities. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have an ethical importance. Schwartz's Theory of Basic Values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. This study leverages STBV to map the motivational values-like infrastructure underpinning leading LLMs.

Objective:

This study aimed to (1) evaluate whether Schwartz's Theory of Basic Values, a framework quantifying cultural value orientations, can measure values-like constructs within leading LLMs; and (2) determine if LLMs exhibit distinct values-like patterns from humans and each other.

Methods:

Four LLMs (Bard, Claude 2, ChatGPT-3.5, ChatGPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire-Revised (PVQ-RR) to assess values-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 humans across 49 nations that had completed the PVQ-RR. This allowed assessing if the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests.

Results:

The PVQ-RR showed good reliability and validity for quantifying values-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction while deemphasizing achievement, power and security relative to humans. Successful discriminant analysis differentiated the four models' distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational values-like constructs that shape their decision-making.

Conclusions:

While the study demonstrated Schwartz's theory can effectively characterize values-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the models underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental healthcare must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

Citation

Please cite as:

Hadar-Shoval D, Asraf K, Mizrachi Y, Haber Y, Elyoseph Z

Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values

JMIR Ment Health 2024;11:e55988

DOI: 10.2196/55988

PMID: 38593424

PMCID: 11040439

Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

Copyright

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.

JMIR Publications

JMIR Preprints

Accepted for/Published in: JMIR Mental Health

Date Submitted: Jan 2, 2024

Open Peer Review Period: Jan 23, 2024 - Mar 19, 2024

Date Accepted: Mar 8, 2024

(closed for review but you can still tweet)

The Invisible Embedded “Values” Within Large Language Models: Implications for Mental Health Use

ABSTRACT

Citation

Copyright