Accepted for/Published in: Journal of Medical Internet Research
Date Submitted: Apr 6, 2020
Open Peer Review Period: Apr 4, 2020 - Apr 14, 2020
Date Accepted: Aug 11, 2020
Date Submitted to PubMed: Aug 31, 2020
(closed for review but you can still tweet)
Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning
ABSTRACT
Background:
Reminiscence is the act of thinking or talking about personal experiences, which occurred in the past. It is a central task of old age serving multiple functions, such as supporting decision-making and introspection, facilitating the transmission of life lessons, as well as bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and predict reminiscence from general conversations.
Objective:
The aims of this original paper are to 1) preprocess coded transcripts of conversations in German of elderly adults with natural language processing (NLP), and 2) develop learning strategies using different NLP pipelines and machine learning algorithms to predict reminiscence in a corpus of transcripts.
Methods:
The methods in this study comprise the: 1) collection and coding of written transcripts of older adults’ conversations in German, 2) preprocessing of transcripts with NLP methodologies (bag-of-words models, part-of-speech tagging, pre-trained German word embeddings), 3) training of machine learning models to predict reminiscence using random forests, support vector machines, adaptive boosting and extreme gradient boosting algorithms. The dataset comprises 2214 transcripts, with 109 reminiscence. Due to class imbalance in data, we introduce three learning strategies: 1) class weighted learning, 2) a meta-classifier consisting of a voting ensemble, and 3) data augmentation with the Synthetic Minority Oversampling TEchnique (SMOTE) algorithm. For each learning strategy, we perform cross-validation on a random sample of the training dataset of transcripts. We used the area under the curve (AUC) to evaluate performance during cross-validation, and we computed the AUC and the average precision (AP) measures on test data, for all combinations of NLP pipelines, algorithms and learning strategies.
Results:
Support vector machines on pre-trained word embeddings outperform all other classifiers and NLP pipelines for both class weighted (AUC=0.91, AR=0.61) and data augmentation (AUC=0.89, AR=0.54) learning strategies. For the meta-classifier learning strategy, the voting ensemble comprising N=50 extreme gradient boosting algorithms fed on pre-trained word embeddings outperforms all other classifiers and NLP pipelines (AUC=0.93, AR=0.58). Therefore, extreme gradient boosting and support vector machines outperform random forests and adaptive boosting, while word embeddings outperfom bag-of-words and POS-tagging NLP pipelines.
Conclusions:
This study provides evidence for developing NLP pipelines for the automated prediction of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for 1) the design of unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults, as well as the classification of their functions. We will deploy these systems in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which have a negative effect on physical and mental health.
Citation
Request queued. Please wait while the file is being generated. It may take some time.
Copyright
© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.