Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Accepted for/Published in: Journal of Medical Internet Research

Date Submitted: Apr 1, 2025
Open Peer Review Period: Apr 3, 2025 - Jun 3, 2025
Date Accepted: May 21, 2025
(closed for review but you can still tweet)

The final, peer-reviewed published version of this preprint can be found here:

Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction

Maharjan J, Jin R, Zhu J, Kenne D

Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction

J Med Internet Res 2025;27:e75347

DOI: 10.2196/75347

PMID: 40627556

PMCID: 12262148

Do Large Language Models Really Understand Personality?

  • Julina Maharjan; 
  • Ruoming Jin; 
  • Jianfeng Zhu; 
  • Deric Kenne

ABSTRACT

Background:

Recent advancements in Large Language Models (LLMs) have generated significant interest in their potential for assessing psychological constructs, particularly personality traits. While prior research has explored LLMs’ capabilities in zero-shot or few-shot personality inference, few studies have systematically evaluated LLM embeddings within a psychometric validity framework or examined their correlations with linguistic and emotional markers. Additionally, the comparative efficacy of LLM embeddings against traditional feature engineering methods remains underexplored, leaving gaps in understanding their scalability and interpretability for computational personality assessment.

Objective:

This study evaluates LLM embeddings for personality trait prediction through four key analyses: (1) performance comparison with zero-shot methods on PANDORA Reddit data, (2) psychometric validation and correlation with LIWC and emotion features, (3) benchmarking against traditional feature engineering approaches, and (4) assessment of model size effects (OpenAI vs BERT vs. RoBERTa). We aim to establish LLM embeddings as a psychometrically valid and efficient alternative for personality assessment.

Methods:

We conducted a multi-stage analysis using 1 million Reddit posts from the PANDORA Big Five Personality dataset. First, we generated text embeddings using three LLM architectures (RoBERTa, BERT, and OpenAI) and trained a custom BiLSTM model for personality prediction. We compared this approach against zero-shot inference using prompt-based methods. Second, we extracted psycholinguistic features (LIWC categories and NRC emotions) and performed feature engineering to evaluate potential performance enhancements. Third, we assessed the psychometric validity of LLM embeddings through: (1) reliability testing using Cronbach's alpha, and (2) convergent validity analysis by examining correlations between embeddings and established linguistic markers. For the latter, we applied Lasso regression for feature selection followed by Pearson correlation analysis between significant linguistic features and embedding dimensions for each personality trait.

Results:

Large Language Model (LLM) embeddings trained using simple deep learning techniques significantly outperform zero-shot approaches across all personality traits. Psychometric validation tests indicate moderate reliability, with an average Cronbach’s alpha of 0.63. Correlation analyses reveal strong associations between LLM embeddings and linguistic/emotional markers: Openness correlates highly with Social (0.53), Conscientiousness with Linguistic (0.46), Extraversion with Social (0.41), Agreeableness with Pronoun usage (0.40), and Neuroticism with Politics-related text (0.63). Advanced feature engineering did not improve model performance, suggesting that LLM embeddings inherently capture key linguistic features. Additionally, model size impacts efficacy, with OpenAI-based models outperforming RoBERTa. These findings highlight the potential of LLM embeddings in personality trait analysis.

Conclusions:

Our findings demonstrate that LLM embeddings offer a robust alternative to zero-shot methods in personality trait analysis, capturing key linguistic patterns without requiring extensive feature engineering. The correlation with established psycholinguistic markers and the influence of model size underscore their potential for scalable and efficient personality assessment. Further research should explore fine-tuning strategies to enhance psychometric validity.


 Citation

Please cite as:

Maharjan J, Jin R, Zhu J, Kenne D

Psychometric Evaluation of Large Language Model Embeddings for Personality Trait Prediction

J Med Internet Res 2025;27:e75347

DOI: 10.2196/75347

PMID: 40627556

PMCID: 12262148

Download PDF


Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.