Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 1, 2025
Open Peer Review Period: Apr 3, 2025 - Jun 3, 2025
Date Accepted: May 21, 2025
(closed for review but you can still tweet)
Do Large Language Models Really Understand Personality?
ABSTRACT
Background:
Recent advancements in Large Language Models (LLMs) have generated significant interest in their potential for assessing psychological constructs, particularly personality traits. While prior research has explored LLMs’ capabilities in zero-shot or few-shot personality inference, few studies have systematically evaluated LLM embeddings within a psychometric validity framework or examined their correlations with linguistic and emotional markers. Additionally, the comparative efficacy of LLM embeddings against traditional feature engineering methods remains underexplored, leaving gaps in understanding their scalability and interpretability for computational personality assessment.
Objective:
This study evaluates LLM embeddings for personality trait prediction through four key analyses: (1) performance comparison with zero-shot methods on PANDORA Reddit data, (2) psychometric validation and correlation with LIWC and emotion features, (3) benchmarking against traditional feature engineering approaches, and (4) assessment of model size effects (OpenAI vs BERT vs. RoBERTa). We aim to establish LLM embeddings as a psychometrically valid and efficient alternative for personality assessment.
Methods:
We conducted a multi-stage analysis using 1 million Reddit posts from the PANDORA Big Five Personality dataset. First, we generated text embeddings using three LLM architectures (RoBERTa, BERT, and OpenAI) and trained a custom BiLSTM model for personality prediction. We compared this approach against zero-shot inference using prompt-based methods. Second, we extracted psycholinguistic features (LIWC categories and NRC emotions) and performed feature engineering to evaluate potential performance enhancements. Third, we assessed the psychometric validity of LLM embeddings through: (1) reliability testing using Cronbach's alpha, and (2) convergent validity analysis by examining correlations between embeddings and established linguistic markers. For the latter, we applied Lasso regression for feature selection followed by Pearson correlation analysis between significant linguistic features and embedding dimensions for each personality trait.
Results:
Large Language Model (LLM) embeddings trained using simple deep learning techniques significantly outperform zero-shot approaches across all personality traits. Psychometric validation tests indicate moderate reliability, with an average Cronbach’s alpha of 0.63. Correlation analyses reveal strong associations between LLM embeddings and linguistic/emotional markers: Openness correlates highly with Social (0.53), Conscientiousness with Linguistic (0.46), Extraversion with Social (0.41), Agreeableness with Pronoun usage (0.40), and Neuroticism with Politics-related text (0.63). Advanced feature engineering did not improve model performance, suggesting that LLM embeddings inherently capture key linguistic features. Additionally, model size impacts efficacy, with OpenAI-based models outperforming RoBERTa. These findings highlight the potential of LLM embeddings in personality trait analysis.
Conclusions:
Our findings demonstrate that LLM embeddings offer a robust alternative to zero-shot methods in personality trait analysis, capturing key linguistic patterns without requiring extensive feature engineering. The correlation with established psycholinguistic markers and the influence of model size underscore their potential for scalable and efficient personality assessment. Further research should explore fine-tuning strategies to enhance psychometric validity.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.