Search | arXiv e-print repository

Large Language Models Can Infer Personality from Free-Form User Interactions

Authors: Heinrich Peters, Moran Cerf, Sandra C. Matz

Abstract: This study investigates the capacity of Large Language Models (LLMs) to infer the Big Five personality traits from free-form user interactions. The results demonstrate that a chatbot powered by GPT-4 can infer personality with moderate accuracy, outperforming previous approaches drawing inferences from static text content. The accuracy of inferences varied across different conversational settings.… ▽ More This study investigates the capacity of Large Language Models (LLMs) to infer the Big Five personality traits from free-form user interactions. The results demonstrate that a chatbot powered by GPT-4 can infer personality with moderate accuracy, outperforming previous approaches drawing inferences from static text content. The accuracy of inferences varied across different conversational settings. Performance was highest when the chatbot was prompted to elicit personality-relevant information from users (mean r=.443, range=[.245, .640]), followed by a condition placing greater emphasis on naturalistic interaction (mean r=.218, range=[.066, .373]). Notably, the direct focus on personality assessment did not result in a less positive user experience, with participants reporting the interactions to be equally natural, pleasant, engaging, and humanlike across both conditions. A chatbot mimicking ChatGPT's default behavior of acting as a helpful assistant led to markedly inferior personality inferences and lower user experience ratings but still captured psychologically meaningful information for some of the personality traits (mean r=.117, range=[-.004, .209]). Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups. Our results highlight the potential of LLMs for psychological profiling based on conversational interactions. We discuss practical implications and ethical challenges associated with these findings. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2404.16066 [pdf, other]

Social Media Use is Predictable from App Sequences: Using LSTM and Transformer Neural Networks to Model Habitual Behavior

Authors: Heinrich Peters, Joseph B. Bayer, Sandra C. Matz, Yikun Chi, Sumer S. Vaid, Gabriella M. Harari

Abstract: The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetiti… ▽ More The present paper introduces a novel approach to studying social media habits through predictive modeling of sequential smartphone user behaviors. While much of the literature on media and technology habits has relied on self-report questionnaires and simple behavioral frequency measures, we examine an important yet understudied aspect of media and technology habits: their embeddedness in repetitive behavioral sequences. Leveraging Long Short-Term Memory (LSTM) and transformer neural networks, we show that (i) social media use is predictable at the within and between-person level and that (ii) there are robust individual differences in the predictability of social media use. We examine the performance of several modeling approaches, including (i) global models trained on the pooled data from all participants, (ii) idiographic person-specific models, and (iii) global models fine-tuned on person-specific data. Neither person-specific modeling nor fine-tuning on person-specific data substantially outperformed the global models, indicating that the global models were able to represent a variety of idiosyncratic behavioral patterns. Additionally, our analyses reveal that the person-level predictability of social media use is not substantially related to the frequency of smartphone use in general or the frequency of social media use, indicating that our approach captures an aspect of habits that is distinct from behavioral frequency. Implications for habit modeling and theoretical development are discussed. △ Less

Submitted 23 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

arXiv:2312.15000 [pdf, other]

The Impact of Cloaking Digital Footprints on User Privacy and Personalization

Authors: Sofie Goethals, Sandra Matz, Foster Provost, Yanou Ramon, David Martens

Abstract: Our online lives generate a wealth of behavioral records -'digital footprints'- which are stored and leveraged by technology platforms. This data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual ori… ▽ More Our online lives generate a wealth of behavioral records -'digital footprints'- which are stored and leveraged by technology platforms. This data can be used to create value for users by personalizing services. At the same time, however, it also poses a threat to people's privacy by offering a highly intimate window into their private traits (e.g., their personality, political ideology, sexual orientation). Prior work has proposed a potential remedy: The cloaking of users' footprints. That is, platforms could allow users to hide portions of their digital footprints from predictive algorithms to avoid undesired inferences. While such an approach has been shown to offer privacy protection in the moment, there are two open questions. First, it remains unclear how well cloaking performs over time. As people constantly leave new digital footprints, the algorithm might regain the ability to predict previously cloaked traits. Second, cloaking digital footprints to avoid one undesirable inference may degrade the performance of models for other, desirable inferences (e.g., those driving desired personalized content). In the light of these research gaps, our contributions are twofold: 1) We propose a novel cloaking strategy that conceals 'metafeatures' (automatically generated higher-level categories) and compares its effectiveness against existing cloaking approaches, and 2) we test the spill-over effects of cloaking one trait on the accuracy of inferences on other traits. A key finding is that the effectiveness of cloaking degrades over times, but the rate at which it degrades is significantly smaller when cloaking metafeatures rather than individual footprints. In addition, our findings reveal the expected trade-off between privacy and personalization: Cloaking an undesired trait also partially conceals other desirable traits. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2310.14533 [pdf, other]

Context-Aware Prediction of User Engagement on Online Social Platforms

Authors: Heinrich Peters, Yozen Liu, Francesco Barbieri, Raiyan Abdul Baten, Sandra C. Matz, Maarten W. Bos

Abstract: The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat se… ▽ More The success of online social platforms hinges on their ability to predict and understand user behavior at scale. Here, we present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight and potentially privacy-preserving representation of user engagement on online social platforms. Leveraging deep LSTM neural networks to analyze more than 100 million Snapchat sessions from almost 80.000 users, we demonstrate that patterns of active and passive use are predictable from past behavior (R2=0.345) and that the integration of context features substantially improves predictive performance compared to the behavioral baseline model (R2=0.522). Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement relative to features derived from histories of in-app behaviors. Further, we show that a large proportion of variance can be accounted for with minimal behavioral histories if momentary context is considered (R2=0.442). These results indicate the potential of context-aware approaches for making models more efficient and privacy-preserving by reducing the need for long data histories. Finally, we employ model explainability techniques to glean preliminary insights into the underlying behavioral mechanisms. Our findings are consistent with the notion of context-contingent, habit-driven patterns of active and passive use, underscoring the value of contextualized representations of user behavior for predicting user engagement on social platforms. △ Less

Submitted 14 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

arXiv:2309.08631 [pdf, other]

Large Language Models Can Infer Psychological Dispositions of Social Media Users

Authors: Heinrich Peters, Sandra Matz

Abstract: Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality trai… ▽ More Large Language Models (LLMs) demonstrate increasingly human-like abilities across a wide variety of tasks. In this paper, we investigate whether LLMs like ChatGPT can accurately infer the psychological dispositions of social media users and whether their ability to do so varies across socio-demographic groups. Specifically, we test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r = .29 (range = [.22, .33]) between LLM-inferred and self-reported trait scores - a level of accuracy that is similar to that of supervised machine learning models specifically trained to infer personality. Our findings also highlight heterogeneity in the accuracy of personality inferences across different age groups and gender categories: predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression. The ability of LLMs to infer psychological dispositions from user-generated text has the potential to democratize access to cheap and scalable psychometric assessments for both researchers and practitioners. On the one hand, this democratization might facilitate large-scale research of high ecological validity and spark innovation in personalized services. On the other hand, it also raises ethical concerns regarding user privacy and self-determination, highlighting the need for stringent ethical frameworks and regulation. △ Less

Submitted 5 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

arXiv:2111.06908 [pdf, other]

doi 10.3390/info12120518

Explainable AI for Psychological Profiling from Digital Footprints: A Case Study of Big Five Personality Predictions from Spending Data

Authors: Yanou Ramon, Sandra C. Matz, R. A. Farrokhnia, David Martens

Abstract: Every step we take in the digital world leaves behind a record of our behavior; a digital footprint. Research has suggested that algorithms can translate these digital footprints into accurate estimates of psychological characteristics, including personality traits, mental health or intelligence. The mechanisms by which AI generates these insights, however, often remain opaque. In this paper, we s… ▽ More Every step we take in the digital world leaves behind a record of our behavior; a digital footprint. Research has suggested that algorithms can translate these digital footprints into accurate estimates of psychological characteristics, including personality traits, mental health or intelligence. The mechanisms by which AI generates these insights, however, often remain opaque. In this paper, we show how Explainable AI (XAI) can help domain experts and data subjects validate, question, and improve models that classify psychological traits from digital footprints. We elaborate on two popular XAI methods (rule extraction and counterfactual explanations) in the context of Big Five personality predictions (traits and facets) from financial transactions data (N = 6,408). First, we demonstrate how global rule extraction sheds light on the spending patterns identified by the model as most predictive for personality, and discuss how these rules can be used to explain, validate, and improve the model. Second, we implement local rule extraction to show that individuals are assigned to personality classes because of their unique financial behavior, and that there exists a positive link between the model's prediction confidence and the number of features that contributed to the prediction. Our experiments highlight the importance of both global and local XAI methods. By better understanding how predictive models work in general as well as how they derive an outcome for a particular person, XAI promotes accountability in a world in which AI impacts the lives of billions of people around the world. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 24 pages, 12 figures, 6 tables

arXiv:2012.02393 [pdf, other]

The Managerial Effects of Algorithmic Fairness Activism

Authors: Bo Cowgill, Fabrizio Dell'Acqua, Sandra Matz

Abstract: How do ethical arguments affect AI adoption in business? We randomly expose business decision-makers to arguments used in AI fairness activism. Arguments emphasizing the inescapability of algorithmic bias lead managers to abandon AI for manual review by humans and report greater expectations about lawsuits and negative PR. These effects persist even when AI lowers gender and racial disparities and… ▽ More How do ethical arguments affect AI adoption in business? We randomly expose business decision-makers to arguments used in AI fairness activism. Arguments emphasizing the inescapability of algorithmic bias lead managers to abandon AI for manual review by humans and report greater expectations about lawsuits and negative PR. These effects persist even when AI lowers gender and racial disparities and when engineering investments to address AI fairness are feasible. Emphasis on status quo comparisons yields opposite effects. We also measure the effects of "scientific veneer" in AI ethics arguments. Scientific veneer changes managerial behavior but does not asymmetrically benefit favorable (versus critical) AI activism. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: Part of the Navigating the Broader Impacts of AI Research Workshop at NeurIPS 2020

arXiv:1911.03855 [pdf, other]

Correcting Sociodemographic Selection Biases for Population Prediction from Social Media

Authors: Salvatore Giorgi, Veronica Lynn, Keshav Gupta, Farhan Ahmed, Sandra Matz, Lyle Ungar, H. Andrew Schwartz

Abstract: Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population -- a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where observations are reweighted according to how under- or… ▽ More Social media is increasingly used for large-scale population predictions, such as estimating community health statistics. However, social media users are not typically a representative sample of the intended population -- a "selection bias". Within the social sciences, such a bias is typically addressed with restratification techniques, where observations are reweighted according to how under- or over-sampled their socio-demographic groups are. Yet, restratifaction is rarely evaluated for improving prediction. In this two-part study, we first evaluate standard, "out-of-the-box" restratification techniques, finding they provide no improvement and often even degraded prediction accuracies across four tasks of esimating U.S. county population health statistics from Twitter. The core reasons for degraded performance seem to be tied to their reliance on either sparse or shrunken estimates of each population's socio-demographics. In the second part of our study, we develop and evaluate Robust Poststratification, which consists of three methods to address these problems: (1) estimator redistribution to account for shrinking, as well as (2) adaptive binning and (3) informed smoothing to handle sparse socio-demographic estimates. We show that each of these methods leads to significant improvement in prediction accuracies over the standard restratification approaches. Taken together, Robust Poststratification enables state-of-the-art prediction accuracies, yielding a 53.0% increase in variance explained (R^2) in the case of surveyed life satisfaction, and a 17.8% average increase across all tasks. △ Less

Submitted 7 June, 2022; v1 submitted 10 November, 2019; originally announced November 2019.

Comments: Published at the 16th International AAAI Conference on Web and Social Media (ICWSM) 2022

arXiv:1705.09866 [pdf, other]

doi 10.1007/s10596-018-9720-1

Machine learning for graph-based representations of three-dimensional discrete fracture networks

Authors: Manuel Valera, Zhengyang Guo, Priscilla Kelly, Sean Matz, Vito Adrian Cantu, Allon G. Percus, Jeffrey D. Hyman, Gowri Srinivasan, Hari S. Viswanathan

Abstract: Structural and topological information play a key role in modeling flow and transport through fractured rock in the subsurface. Discrete fracture network (DFN) computational suites such as dfnWorks are designed to simulate flow and transport in such porous media. Flow and transport calculations reveal that a small backbone of fractures exists, where most flow and transport occurs. Restricting the… ▽ More Structural and topological information play a key role in modeling flow and transport through fractured rock in the subsurface. Discrete fracture network (DFN) computational suites such as dfnWorks are designed to simulate flow and transport in such porous media. Flow and transport calculations reveal that a small backbone of fractures exists, where most flow and transport occurs. Restricting the flowing fracture network to this backbone provides a significant reduction in the network's effective size. However, the particle tracking simulations needed to determine the reduction are computationally intensive. Such methods may be impractical for large systems or for robust uncertainty quantification of fracture networks, where thousands of forward simulations are needed to bound system behavior. In this paper, we develop an alternative network reduction approach to characterizing transport in DFNs, by combining graph theoretical and machine learning methods. We consider a graph representation where nodes signify fractures and edges denote their intersections. Using random forest and support vector machines, we rapidly identify a subnetwork that captures the flow patterns of the full DFN, based primarily on node centrality features in the graph. Our supervised learning techniques train on particle-tracking backbone paths found by dfnWorks, but run in negligible time compared to those simulations. We find that our predictions can reduce the network to approximately 20% of its original size, while still generating breakthrough curves consistent with those of the original network. △ Less

Submitted 29 January, 2018; v1 submitted 27 May, 2017; originally announced May 2017.

Comments: Computational Geosciences (2018)

Report number: LA-UR-17-24300

Journal ref: Computational Geosciences 22, 695-710 (2018)

arXiv:1705.08038 [pdf, other]

doi 10.1371/journal.pone.0201703

Latent Human Traits in the Language of Social Media: An Open-Vocabulary Approach

Authors: Vivek Kulkarni, Margaret L. Kern, David Stillwell, Michal Kosinski, Sandra Matz, Lyle Ungar, Steven Skiena, H. Andrew Schwartz

Abstract: Over the past century, personality theory and research has successfully identified core sets of characteristics that consistently describe and explain fundamental differences in the way people think, feel and behave. Such characteristics were derived through theory, dictionary analyses, and survey research using explicit self-reports. The availability of social media data spanning millions of user… ▽ More Over the past century, personality theory and research has successfully identified core sets of characteristics that consistently describe and explain fundamental differences in the way people think, feel and behave. Such characteristics were derived through theory, dictionary analyses, and survey research using explicit self-reports. The availability of social media data spanning millions of users now makes it possible to automatically derive characteristics from language use -- at large scale. Taking advantage of linguistic information available through Facebook, we study the process of inferring a new set of potential human traits based on unprompted language use. We subject these new traits to a comprehensive set of evaluations and compare them with a popular five factor model of personality. We find that our language-based trait construct is often more generalizable in that it often predicts non-questionnaire-based outcomes better than questionnaire-based traits (e.g. entities someone likes, income and intelligence quotient), while the factors remain nearly as stable as traditional factors. Our approach suggests a value in new constructs of personality derived from everyday human language use. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: In submission to PLOS One

Showing 1–10 of 10 results for author: Matz, S